:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bando, Yoshiaki, Nakamura, Tomohiko, Watanabe, Shinji
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Artificial Intelligence
Online Access:	https://arxiv.org/abs/2406.08396
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Discrete Speech Unit Extraction via Independent Component Analysis
by: Nakamura, Tomohiko, et al.
Published: (2025)

The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition
by: Gao, Ming, et al.
Published: (2025)

Is MixIT Really Unsuitable for Correlated Sources? Exploring MixIT for Unsupervised Pre-training in Music Source Separation
by: Saijo, Kohei, et al.
Published: (2025)

Input-Adaptive Spectral Feature Compression by Sequence Modeling for Source Separation
by: Saijo, Kohei, et al.
Published: (2026)

The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
by: Cornell, Samuele, et al.
Published: (2024)

Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization
by: Wu, Yihan, et al.
Published: (2024)

ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings
by: Mariotte, Theo, et al.
Published: (2024)

Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising
by: Fujita, Yoto, et al.
Published: (2024)

A Unified Speech LLM for Diarization and Speech Recognition in Multilingual Conversations
by: Saengthong, Phurich, et al.
Published: (2025)

Do Neural Codecs Generalize? A Controlled Study Across Unseen Languages and Non-Speech Tasks
by: Wang, Shih-Heng, et al.
Published: (2026)

Bangla-WhisperDiar: Fine-Tuning Whisper and PyAnnote for Bangla Long-Form Speech Recognition and Speaker Diarization
by: Bhuiyan, Mohammed Aman, et al.
Published: (2026)

Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation
by: Kim, Miseul, et al.
Published: (2025)

Open Source State-Of-the-Art Solution for Romanian Speech Recognition
by: Pirlogeanu, Gabriel, et al.
Published: (2025)

SHAMaNS: Sound Localization with Hybrid Alpha-Stable Spatial Measure and Neural Steerer
by: Di Carlo, Diego, et al.
Published: (2025)

Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization
by: von Neumann, Thilo, et al.
Published: (2023)

Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition
by: Shi, Hao, et al.
Published: (2024)

Blind Separation of Vibration Sources using Deep Learning and Deconvolution
by: Makienko, Igor, et al.
Published: (2024)

Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper
by: Yeo, Jeong Hun, et al.
Published: (2023)

Speech-DRAME: A Framework for Human-Aligned Benchmarks in Speech Role-Play
by: Shi, Jiatong, et al.
Published: (2025)

OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models
by: Chen, William, et al.
Published: (2025)

Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder
by: Shakeel, Muhammad, et al.
Published: (2025)

FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
by: Lu, Yichen, et al.
Published: (2024)

Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)

Improving Neural Diarization through Speaker Attribute Attractors and Local Dependency Modeling
by: Palzer, David, et al.
Published: (2025)

Subspace Track-before-Detect for Passive Multi-Target Tracking with Unknown Emitted Signals
by: Ito, Nobutaka, et al.
Published: (2026)

Text-To-Speech Synthesis In The Wild
by: Jung, Jee-weon, et al.
Published: (2024)

MOSS Transcribe Diarize Technical Report
by: AI, MOSI., et al.
Published: (2026)

SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis
by: Take, Osamu, et al.
Published: (2024)

From Modular to End-to-End Speaker Diarization
by: Landini, Federico
Published: (2024)

DOA-Aware Audio-Visual Self-Supervised Learning for Sound Event Localization and Detection
by: Fujita, Yoto, et al.
Published: (2024)

MAPSS: Manifold-based Assessment of Perceptual Source Separation
by: Ivry, Amir, et al.
Published: (2025)

Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
by: Xie, Jiamin, et al.
Published: (2025)

SDBench: A Comprehensive Benchmark Suite for Speaker Diarization
by: Pacheco, Eduardo, et al.
Published: (2025)

Self-Supervised Speech Representations are More Phonetic than Semantic
by: Choi, Kwanghee, et al.
Published: (2024)

Source Separation & Automatic Transcription for Music
by: Derby, Bradford, et al.
Published: (2024)

User-guided Generative Source Separation
by: Wen, Yutong, et al.
Published: (2025)

VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverberation and Blind RIR Identification
by: Wang, Pengyu, et al.
Published: (2025)

Study of the Performance of CEEMDAN in Underdetermined Speech Separation
by: Melhem, Rawad, et al.
Published: (2024)

TinyML for Speech Recognition
by: Barovic, Andrew, et al.
Published: (2025)

End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization
by: Singh, Prachi, et al.
Published: (2024)