:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Eisenberg, Aviad, Gannot, Sharon, Chazan, Shlomo E.
Format:	Preprint
Published:	2025
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.06285
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Spectral or spatial? Leveraging both for speaker extraction in challenging data conditions
by: Eisenberg, Aviad, et al.
Published: (2025)

Single-Microphone Speaker Separation and Voice Activity Detection in Noisy and Reverberant Environments
by: Opochinsky, Renana, et al.
Published: (2024)

SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models
by: Yin, Han, et al.
Published: (2025)

Binaural Target Speaker Extraction using Individualized HRTF
by: Ellinson, Yoav, et al.
Published: (2025)

From Modular to End-to-End Speaker Diarization
by: Landini, Federico
Published: (2024)

Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)

HRTF-guided Binaural Target Speaker Extraction with Real-World Validation
by: Ellinson, Yoav, et al.
Published: (2026)

End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization
by: Singh, Prachi, et al.
Published: (2024)

Multi-Microphone and Multi-Modal Emotion Recognition in Reverberant Environment
by: Cohen, Ohad, et al.
Published: (2024)

Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio
by: He, Xinlu, et al.
Published: (2025)

G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition
by: Peng, Jing, et al.
Published: (2026)

DiffusionRIR: Room Impulse Response Interpolation using Diffusion Models
by: Della Torre, Sagi, et al.
Published: (2025)

An Investigation on Speaker Augmentation for End-to-End Speaker Extraction
by: You, Zhenghai, et al.
Published: (2025)

On the Usefulness of Diffusion-Based Room Impulse Response Interpolation to Microphone Array Processing
by: Della Torre, Sagi, et al.
Published: (2026)

DGFNet: End-to-End Audio-Visual Source Separation Based on Dynamic Gating Fusion
by: Yu, Yinfeng, et al.
Published: (2025)

An End-to-End Approach for Korean Wakeword Systems with Speaker Authentication
by: Seo, Geonwoo
Published: (2025)

DialogGraph-LLM: Graph-Informed LLMs for End-to-End Audio Dialogue Intent Recognition
by: Liu, HongYu, et al.
Published: (2025)

Bridging Biological Hearing and Neuromorphic Computing: End-to-End Time-Domain Audio Signal Processing with Reservoir Computing
by: Sebastian, Rinku, et al.
Published: (2026)

MEBM-Phoneme: Multi-scale Enhanced BrainMagic for End-to-End MEG Phoneme Classification
by: Jinghua, Liang, et al.
Published: (2026)

Neuro-MSBG: An End-to-End Neural Model for Hearing Loss Simulation
by: Yuan, Hui-Guan, et al.
Published: (2025)

An End-to-End Approach for Chord-Conditioned Song Generation
by: Gao, Shuochen, et al.
Published: (2024)

Towards Real-Time Human-AI Musical Co-Performance: Accompaniment Generation with Latent Diffusion Models and MAX/MSP
by: Karchkhadze, Tornike, et al.
Published: (2026)

Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)

Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition
by: Chen, Jinming, et al.
Published: (2024)

TFGA-Net: Temporal-Frequency Graph Attention Network for Brain-Controlled Speaker Extraction
by: Si, Youhao, et al.
Published: (2025)

Multi-Microphone Speech Emotion Recognition using the Hierarchical Token-semantic Audio Transformer Architecture
by: Cohen, Ohad, et al.
Published: (2024)

Mask2Flow-TSE: Two-Stage Target Speaker Extraction with Masking and Flow Matching
by: Moon, Junwon, et al.
Published: (2026)

End-to-End User-Defined Keyword Spotting using Shifted Delta Coefficients
by: V, Kesavaraj, et al.
Published: (2024)

SAGE-LD: Towards Scalable and Generalizable End-to-End Language Diarization via Simulated Data Augmentation
by: Lee, Sangmin, et al.
Published: (2025)

Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)

AlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlow
by: Li, Duojia, et al.
Published: (2026)

Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model
by: Peng, Shuhai, et al.
Published: (2026)

End-to-End Real-World Polyphonic Piano Audio-to-Score Transcription with Hierarchical Decoding
by: Zeng, Wei, et al.
Published: (2024)

Recent Advances in End-to-End Simultaneous Speech Translation
by: Liu, Xiaoqian, et al.
Published: (2024)

Retrieval Augmented End-to-End Spoken Dialog Models
by: Wang, Mingqiu, et al.
Published: (2024)

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
by: Dai, Dongyang, et al.
Published: (2025)

Wave-U-Mamba: An End-To-End Framework For High-Quality And Efficient Speech Super Resolution
by: Lee, Yongjoon, et al.
Published: (2024)

Modality-Specific Speech Enhancement and Noise-Adaptive Fusion for Acoustic and Body-Conduction Microphone Framework
by: Kim, Yunsik, et al.
Published: (2025)

TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
by: Bataev, Vladimir, et al.
Published: (2025)

Rethinking Leveraging Pre-Trained Multi-Layer Representations for Speaker Verification
by: Kim, Jin Sob, et al.
Published: (2025)