Saved in:
| Main Authors: | Bando, Yoshiaki, Nakamura, Tomohiko, Watanabe, Shinji |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.08396 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Discrete Speech Unit Extraction via Independent Component Analysis
by: Nakamura, Tomohiko, et al.
Published: (2025)
by: Nakamura, Tomohiko, et al.
Published: (2025)
The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition
by: Gao, Ming, et al.
Published: (2025)
by: Gao, Ming, et al.
Published: (2025)
Is MixIT Really Unsuitable for Correlated Sources? Exploring MixIT for Unsupervised Pre-training in Music Source Separation
by: Saijo, Kohei, et al.
Published: (2025)
by: Saijo, Kohei, et al.
Published: (2025)
Input-Adaptive Spectral Feature Compression by Sequence Modeling for Source Separation
by: Saijo, Kohei, et al.
Published: (2026)
by: Saijo, Kohei, et al.
Published: (2026)
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
by: Cornell, Samuele, et al.
Published: (2024)
by: Cornell, Samuele, et al.
Published: (2024)
Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization
by: Wu, Yihan, et al.
Published: (2024)
by: Wu, Yihan, et al.
Published: (2024)
ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings
by: Mariotte, Theo, et al.
Published: (2024)
by: Mariotte, Theo, et al.
Published: (2024)
Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising
by: Fujita, Yoto, et al.
Published: (2024)
by: Fujita, Yoto, et al.
Published: (2024)
A Unified Speech LLM for Diarization and Speech Recognition in Multilingual Conversations
by: Saengthong, Phurich, et al.
Published: (2025)
by: Saengthong, Phurich, et al.
Published: (2025)
Do Neural Codecs Generalize? A Controlled Study Across Unseen Languages and Non-Speech Tasks
by: Wang, Shih-Heng, et al.
Published: (2026)
by: Wang, Shih-Heng, et al.
Published: (2026)
Bangla-WhisperDiar: Fine-Tuning Whisper and PyAnnote for Bangla Long-Form Speech Recognition and Speaker Diarization
by: Bhuiyan, Mohammed Aman, et al.
Published: (2026)
by: Bhuiyan, Mohammed Aman, et al.
Published: (2026)
Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation
by: Kim, Miseul, et al.
Published: (2025)
by: Kim, Miseul, et al.
Published: (2025)
Open Source State-Of-the-Art Solution for Romanian Speech Recognition
by: Pirlogeanu, Gabriel, et al.
Published: (2025)
by: Pirlogeanu, Gabriel, et al.
Published: (2025)
SHAMaNS: Sound Localization with Hybrid Alpha-Stable Spatial Measure and Neural Steerer
by: Di Carlo, Diego, et al.
Published: (2025)
by: Di Carlo, Diego, et al.
Published: (2025)
Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization
by: von Neumann, Thilo, et al.
Published: (2023)
by: von Neumann, Thilo, et al.
Published: (2023)
Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition
by: Shi, Hao, et al.
Published: (2024)
by: Shi, Hao, et al.
Published: (2024)
Blind Separation of Vibration Sources using Deep Learning and Deconvolution
by: Makienko, Igor, et al.
Published: (2024)
by: Makienko, Igor, et al.
Published: (2024)
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper
by: Yeo, Jeong Hun, et al.
Published: (2023)
by: Yeo, Jeong Hun, et al.
Published: (2023)
Speech-DRAME: A Framework for Human-Aligned Benchmarks in Speech Role-Play
by: Shi, Jiatong, et al.
Published: (2025)
by: Shi, Jiatong, et al.
Published: (2025)
OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models
by: Chen, William, et al.
Published: (2025)
by: Chen, William, et al.
Published: (2025)
Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder
by: Shakeel, Muhammad, et al.
Published: (2025)
by: Shakeel, Muhammad, et al.
Published: (2025)
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
by: Lu, Yichen, et al.
Published: (2024)
by: Lu, Yichen, et al.
Published: (2024)
Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)
Improving Neural Diarization through Speaker Attribute Attractors and Local Dependency Modeling
by: Palzer, David, et al.
Published: (2025)
by: Palzer, David, et al.
Published: (2025)
Subspace Track-before-Detect for Passive Multi-Target Tracking with Unknown Emitted Signals
by: Ito, Nobutaka, et al.
Published: (2026)
by: Ito, Nobutaka, et al.
Published: (2026)
Text-To-Speech Synthesis In The Wild
by: Jung, Jee-weon, et al.
Published: (2024)
by: Jung, Jee-weon, et al.
Published: (2024)
MOSS Transcribe Diarize Technical Report
by: AI, MOSI., et al.
Published: (2026)
by: AI, MOSI., et al.
Published: (2026)
SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis
by: Take, Osamu, et al.
Published: (2024)
by: Take, Osamu, et al.
Published: (2024)
From Modular to End-to-End Speaker Diarization
by: Landini, Federico
Published: (2024)
by: Landini, Federico
Published: (2024)
DOA-Aware Audio-Visual Self-Supervised Learning for Sound Event Localization and Detection
by: Fujita, Yoto, et al.
Published: (2024)
by: Fujita, Yoto, et al.
Published: (2024)
MAPSS: Manifold-based Assessment of Perceptual Source Separation
by: Ivry, Amir, et al.
Published: (2025)
by: Ivry, Amir, et al.
Published: (2025)
Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
by: Xie, Jiamin, et al.
Published: (2025)
by: Xie, Jiamin, et al.
Published: (2025)
SDBench: A Comprehensive Benchmark Suite for Speaker Diarization
by: Pacheco, Eduardo, et al.
Published: (2025)
by: Pacheco, Eduardo, et al.
Published: (2025)
Self-Supervised Speech Representations are More Phonetic than Semantic
by: Choi, Kwanghee, et al.
Published: (2024)
by: Choi, Kwanghee, et al.
Published: (2024)
Source Separation & Automatic Transcription for Music
by: Derby, Bradford, et al.
Published: (2024)
by: Derby, Bradford, et al.
Published: (2024)
User-guided Generative Source Separation
by: Wen, Yutong, et al.
Published: (2025)
by: Wen, Yutong, et al.
Published: (2025)
VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverberation and Blind RIR Identification
by: Wang, Pengyu, et al.
Published: (2025)
by: Wang, Pengyu, et al.
Published: (2025)
Study of the Performance of CEEMDAN in Underdetermined Speech Separation
by: Melhem, Rawad, et al.
Published: (2024)
by: Melhem, Rawad, et al.
Published: (2024)
TinyML for Speech Recognition
by: Barovic, Andrew, et al.
Published: (2025)
by: Barovic, Andrew, et al.
Published: (2025)
End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization
by: Singh, Prachi, et al.
Published: (2024)
by: Singh, Prachi, et al.
Published: (2024)
Similar Items
-
Discrete Speech Unit Extraction via Independent Component Analysis
by: Nakamura, Tomohiko, et al.
Published: (2025) -
The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition
by: Gao, Ming, et al.
Published: (2025) -
Is MixIT Really Unsuitable for Correlated Sources? Exploring MixIT for Unsupervised Pre-training in Music Source Separation
by: Saijo, Kohei, et al.
Published: (2025) -
Input-Adaptive Spectral Feature Compression by Sequence Modeling for Source Separation
by: Saijo, Kohei, et al.
Published: (2026) -
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
by: Cornell, Samuele, et al.
Published: (2024)