Saved in:
| Main Authors: | Wu, Yihan, Lu, Yichen, Peng, Yifan, Wang, Xihua, Song, Ruihua, Watanabe, Shinji |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.19005 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
by: Wu, Yihan, et al.
Published: (2024)
by: Wu, Yihan, et al.
Published: (2024)
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
by: Wu, Yihan, et al.
Published: (2024)
by: Wu, Yihan, et al.
Published: (2024)
Neural Blind Source Separation and Diarization for Distant Speech Recognition
by: Bando, Yoshiaki, et al.
Published: (2024)
by: Bando, Yoshiaki, et al.
Published: (2024)
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
by: Lu, Yichen, et al.
Published: (2024)
by: Lu, Yichen, et al.
Published: (2024)
Speech-DRAME: A Framework for Human-Aligned Benchmarks in Speech Role-Play
by: Shi, Jiatong, et al.
Published: (2025)
by: Shi, Jiatong, et al.
Published: (2025)
OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models
by: Chen, William, et al.
Published: (2025)
by: Chen, William, et al.
Published: (2025)
Aligning Text-to-Music Evaluation with Human Preferences
by: Huang, Yichen, et al.
Published: (2025)
by: Huang, Yichen, et al.
Published: (2025)
VSSFlow: Unifying Video-conditioned Sound and Speech Generation via Joint Learning
by: Cheng, Xin, et al.
Published: (2025)
by: Cheng, Xin, et al.
Published: (2025)
LoVA: Long-form Video-to-Audio Generation
by: Cheng, Xin, et al.
Published: (2024)
by: Cheng, Xin, et al.
Published: (2024)
Text-To-Speech Synthesis In The Wild
by: Jung, Jee-weon, et al.
Published: (2024)
by: Jung, Jee-weon, et al.
Published: (2024)
Do Neural Codecs Generalize? A Controlled Study Across Unseen Languages and Non-Speech Tasks
by: Wang, Shih-Heng, et al.
Published: (2026)
by: Wang, Shih-Heng, et al.
Published: (2026)
The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition
by: Gao, Ming, et al.
Published: (2025)
by: Gao, Ming, et al.
Published: (2025)
Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation
by: Shakeel, Muhammad, et al.
Published: (2024)
by: Shakeel, Muhammad, et al.
Published: (2024)
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
by: Jung, Jee-weon, et al.
Published: (2024)
by: Jung, Jee-weon, et al.
Published: (2024)
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper
by: Yeo, Jeong Hun, et al.
Published: (2023)
by: Yeo, Jeong Hun, et al.
Published: (2023)
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
by: Peng, Yifan, et al.
Published: (2024)
by: Peng, Yifan, et al.
Published: (2024)
Towards Robust Speech Representation Learning for Thousands of Languages
by: Chen, William, et al.
Published: (2024)
by: Chen, William, et al.
Published: (2024)
Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
by: Shakeel, Muhammad, et al.
Published: (2024)
by: Shakeel, Muhammad, et al.
Published: (2024)
Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
by: Xie, Jiamin, et al.
Published: (2025)
by: Xie, Jiamin, et al.
Published: (2025)
Enhancing Speech Emotion Recognition through Segmental Average Pooling of Self-Supervised Learning Features
by: Hyeon, Jonghwan, et al.
Published: (2024)
by: Hyeon, Jonghwan, et al.
Published: (2024)
Improving Code-Switching Speech Recognition with TTS Data Augmentation
by: Yeo, Yue Heng, et al.
Published: (2026)
by: Yeo, Yue Heng, et al.
Published: (2026)
Contextualized Automatic Speech Recognition with Dynamic Vocabulary
by: Sudo, Yui, et al.
Published: (2024)
by: Sudo, Yui, et al.
Published: (2024)
Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition
by: Kim, Jaeyoung, et al.
Published: (2024)
by: Kim, Jaeyoung, et al.
Published: (2024)
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline
by: Wang, Helin, et al.
Published: (2025)
by: Wang, Helin, et al.
Published: (2025)
Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing
by: Shim, Hye-jin, et al.
Published: (2024)
by: Shim, Hye-jin, et al.
Published: (2024)
Multi-Convformer: Extending Conformer with Multiple Convolution Kernels
by: Prabhu, Darshan, et al.
Published: (2024)
by: Prabhu, Darshan, et al.
Published: (2024)
Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech
by: Choi, Yerin, et al.
Published: (2024)
by: Choi, Yerin, et al.
Published: (2024)
Qieemo: Speech Is All You Need in the Emotion Recognition in Conversations
by: Chen, Jinming, et al.
Published: (2025)
by: Chen, Jinming, et al.
Published: (2025)
Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition
by: Jin, Zengrui, et al.
Published: (2022)
by: Jin, Zengrui, et al.
Published: (2022)
GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer Learning for Speech Emotion Recognition
by: Pan, Yu, et al.
Published: (2024)
by: Pan, Yu, et al.
Published: (2024)
ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration
by: Someki, Masao, et al.
Published: (2024)
by: Someki, Masao, et al.
Published: (2024)
Aligning Speech to Languages to Enhance Code-switching Speech Recognition
by: Liu, Hexin, et al.
Published: (2024)
by: Liu, Hexin, et al.
Published: (2024)
MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention
by: Jiao, Xinxin, et al.
Published: (2024)
by: Jiao, Xinxin, et al.
Published: (2024)
Discrete Speech Unit Extraction via Independent Component Analysis
by: Nakamura, Tomohiko, et al.
Published: (2025)
by: Nakamura, Tomohiko, et al.
Published: (2025)
Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search
by: Sudo, Yui, et al.
Published: (2024)
by: Sudo, Yui, et al.
Published: (2024)
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)
by: Kang, Jiawen, et al.
Published: (2024)
TinyML for Speech Recognition
by: Barovic, Andrew, et al.
Published: (2025)
by: Barovic, Andrew, et al.
Published: (2025)
Open Source State-Of-the-Art Solution for Romanian Speech Recognition
by: Pirlogeanu, Gabriel, et al.
Published: (2025)
by: Pirlogeanu, Gabriel, et al.
Published: (2025)
Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition
by: Pokel, Niclas, et al.
Published: (2025)
by: Pokel, Niclas, et al.
Published: (2025)
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge
by: Guo, Yiwei, et al.
Published: (2024)
by: Guo, Yiwei, et al.
Published: (2024)
Similar Items
-
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
by: Wu, Yihan, et al.
Published: (2024) -
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
by: Wu, Yihan, et al.
Published: (2024) -
Neural Blind Source Separation and Diarization for Distant Speech Recognition
by: Bando, Yoshiaki, et al.
Published: (2024) -
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
by: Lu, Yichen, et al.
Published: (2024) -
Speech-DRAME: A Framework for Human-Aligned Benchmarks in Speech Role-Play
by: Shi, Jiatong, et al.
Published: (2025)