:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Rozenfeld, Vadim, Goldshtein, Bracha Laufer
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Sound
Online Access:	https://arxiv.org/abs/2409.11804
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Uncertainty Quantification and Risk Control for Multi-Speaker Sound Source Localization
by: Rozenfeld, Vadim, et al.
Published: (2026)

MAPSS: Manifold-based Assessment of Perceptual Source Separation
by: Ivry, Amir, et al.
Published: (2025)

LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
by: Yemini, Yochai, et al.
Published: (2023)

CNN-based Robust Sound Source Localization with SRP-PHAT for the Extreme Edge
by: Yin, Jun, et al.
Published: (2025)

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
by: Huang, He, et al.
Published: (2024)

Where's That Voice Coming? Continual Learning for Sound Source Localization
by: Xiao, Yang, et al.
Published: (2024)

An Efficient GPU-based Implementation for Noise Robust Sound Source Localization
by: Lin, Zirui, et al.
Published: (2025)

Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
by: Qian, Xinyuan, et al.
Published: (2024)

TF-Mamba: A Time-Frequency Network for Sound Source Localization
by: Xiao, Yang, et al.
Published: (2024)

Steered Response Power for Sound Source Localization: A Tutorial Review
by: Grinstein, Eric, et al.
Published: (2024)

GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch
by: Lee, Sungho, et al.
Published: (2024)

Determined Blind Source Separation with Sinkhorn Divergence-based Optimal Allocation of the Source Power
by: Wang, Jianyu, et al.
Published: (2025)

Conformer-based Ultrasound-to-Speech Conversion
by: Ibrahimov, Ibrahim, et al.
Published: (2025)

IPDnet: A Universal Direct-Path IPD Estimation Network for Sound Source Localization
by: Wang, Yabo, et al.
Published: (2024)

A Steered Response Power Method for Sound Source Localization With Generic Acoustic Models
by: Müller, Kaspar, et al.
Published: (2025)

Combining Audio and Non-Audio Inputs in Evolved Neural Networks for Ovenbird
by: Hernandez, Sergio Poo, et al.
Published: (2025)

Efficient Area-based and Speaker-Agnostic Source Separation
by: Strauss, Martin, et al.
Published: (2024)

SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction
by: Tang, Yuxun, et al.
Published: (2024)

Unrestricted Global Phase Bias-Aware Single-channel Speech Enhancement with Conformer-based Metric GAN
by: Zhang, Shiqi, et al.
Published: (2024)

SYKI-SVC: Advancing Singing Voice Conversion with Post-Processing Innovations and an Open-Source Professional Testset
by: Zhou, Yiquan, et al.
Published: (2025)

Interpreting End-to-End Deep Learning Models for Speech Source Localization Using Layer-wise Relevance Propagation
by: Comanducci, Luca, et al.
Published: (2024)

Diffusion based Text-to-Music Generation with Global and Local Text based Conditioning
by: Zhang, Jisi, et al.
Published: (2025)

Binaural Sound Event Localization and Detection Neural Network based on HRTF Localization Cues for Humanoid Robots
by: Lee, Gyeong-Tae
Published: (2025)

Reference Microphone Selection for Guided Source Separation based on the Normalized L-p Norm
by: Lohmann, Anselm, et al.
Published: (2025)

Determined Multichannel Blind Source Separation with Clustered Source Model
by: Wang, Jianyu, et al.
Published: (2024)

DTT-BSR: GAN-based DTTNet with RoPE Transformer Enhancement for Music Source Restoration
by: Tan, Shihong, et al.
Published: (2026)

ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription
by: Le, Khanh, et al.
Published: (2025)

Single-Microphone-Based Sound Source Localization for Mobile Robots in Reverberant Environments
by: Wang, Jiang, et al.
Published: (2025)

Efficient and Robust Long-Form Speech Recognition with Hybrid H3-Conformer
by: Honda, Tomoki, et al.
Published: (2024)

Binaural Sound Event Localization and Detection based on HRTF Cues for Humanoid Robots
by: Lee, Gyeong-Tae, et al.
Published: (2025)

Noise-Robust Contrastive Learning with an MFCC-Conformer For Coronary Artery Disease Detection
by: Marocchi, Milan, et al.
Published: (2026)

Source Separation by Flow Matching
by: Scheibler, Robin, et al.
Published: (2025)

Source Verification for Speech Deepfakes
by: Negroni, Viola, et al.
Published: (2025)

Frame-Aligned Fusion of Canary and WavLM for Non-Intrusive Intelligibility Prediction of Hearing-Aid-Processed Speech
by: Nakazawa, Kazushi
Published: (2026)

A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
by: Saijo, Kohei, et al.
Published: (2025)

AuralNet: Hierarchical Attention-based 3D Binaural Localization of Overlapping Speakers
by: Fu, Linya, et al.
Published: (2025)

WTFormer: A Wavelet Conformer Network for MIMO Speech Enhancement with Spatial Cues Peservation
by: Han, Lu, et al.
Published: (2025)

Source Tracing of Audio Deepfake Systems
by: Klein, Nicholas, et al.
Published: (2024)

Task-Aware Unified Source Separation
by: Saijo, Kohei, et al.
Published: (2024)

Fast Algorithm for Moving Sound Source
by: Yang, Dong
Published: (2025)