Saved in:
| Main Authors: | Ishida, Shoma, Ono, Satoshi |
|---|---|
| Format: | Preprint |
| Published: |
2020
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2012.11138 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Graph-based multi-Feature fusion method for speech emotion recognition
by: Liu, Xueyu, et al.
Published: (2024)
by: Liu, Xueyu, et al.
Published: (2024)
Robust fine-tuning of speech recognition models via model merging: application to disordered speech
by: Ducorroy, Alexandre, et al.
Published: (2025)
by: Ducorroy, Alexandre, et al.
Published: (2025)
asr_eval: Algorithms and tools for multi-reference and streaming speech recognition evaluation
by: Sedukhin, Oleg, et al.
Published: (2026)
by: Sedukhin, Oleg, et al.
Published: (2026)
Language model integration based on memory control for sequence to sequence speech recognition
by: Cho, Jaejin, et al.
Published: (2018)
by: Cho, Jaejin, et al.
Published: (2018)
Phoneme-based speech recognition driven by large language models and sampling marginalization
by: Ma, Te, et al.
Published: (2025)
by: Ma, Te, et al.
Published: (2025)
Improving child speech recognition with augmented child-like speech
by: Zhang, Yuanyuan, et al.
Published: (2024)
by: Zhang, Yuanyuan, et al.
Published: (2024)
Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition
by: An, Keyu, et al.
Published: (2024)
by: An, Keyu, et al.
Published: (2024)
Towards noise-robust speech inversion through multi-task learning with speech enhancement
by: Tabatabaee, Saba, et al.
Published: (2026)
by: Tabatabaee, Saba, et al.
Published: (2026)
AEROMamba: An efficient architecture for audio super-resolution using generative adversarial networks and state space models
by: Abreu, Wallace, et al.
Published: (2024)
by: Abreu, Wallace, et al.
Published: (2024)
Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition
by: Zhang, Yiru, et al.
Published: (2025)
by: Zhang, Yiru, et al.
Published: (2025)
Charting 15 years of progress in deep learning for speech emotion recognition: A replication study
by: Triantafyllopoulos, Andreas, et al.
Published: (2025)
by: Triantafyllopoulos, Andreas, et al.
Published: (2025)
PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers
by: Pandey, Rahul, et al.
Published: (2023)
by: Pandey, Rahul, et al.
Published: (2023)
Explainable speech emotion recognition through attentive pooling: insights from attention-based temporal localization
by: Leygue, Tahitoa, et al.
Published: (2025)
by: Leygue, Tahitoa, et al.
Published: (2025)
Automated speech audiometry: Can it work using open-source pre-trained Kaldi-NL automatic speech recognition?
by: Araiza-Illan, Gloria, et al.
Published: (2023)
by: Araiza-Illan, Gloria, et al.
Published: (2023)
Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus
by: Chen, Szu-Jui, et al.
Published: (2026)
by: Chen, Szu-Jui, et al.
Published: (2026)
Deep learning-based filtering of cross-spectral matrices using generative adversarial networks
by: Puhle, Christof
Published: (2025)
by: Puhle, Christof
Published: (2025)
Heterogeneous bimodal attention fusion for speech emotion recognition
by: Luo, Jiachen, et al.
Published: (2025)
by: Luo, Jiachen, et al.
Published: (2025)
LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition
by: Kwak, Doyeop, et al.
Published: (2026)
by: Kwak, Doyeop, et al.
Published: (2026)
Introduction to speech recognition
by: Dauphin, Gabriel
Published: (2024)
by: Dauphin, Gabriel
Published: (2024)
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
by: Gong, Rong, et al.
Published: (2024)
by: Gong, Rong, et al.
Published: (2024)
Fusion approaches for emotion recognition from speech using acoustic and text-based features
by: Pepino, Leonardo, et al.
Published: (2024)
by: Pepino, Leonardo, et al.
Published: (2024)
Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet
by: Dhakal, Manish, et al.
Published: (2024)
by: Dhakal, Manish, et al.
Published: (2024)
learning discriminative features from spectrograms using center loss for speech emotion recognition
by: Dai, Dongyang, et al.
Published: (2025)
by: Dai, Dongyang, et al.
Published: (2025)
Prosodic Parameter Manipulation in TTS generated speech for Controlled Speech Generation
by: Chary, Podakanti Satyajith
Published: (2024)
by: Chary, Podakanti Satyajith
Published: (2024)
Adversarial speech for voice privacy protection from Personalized Speech generation
by: Chen, Shihao, et al.
Published: (2024)
by: Chen, Shihao, et al.
Published: (2024)
Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition
by: Dong, Lukuang, et al.
Published: (2026)
by: Dong, Lukuang, et al.
Published: (2026)
Zipformer: A faster and better encoder for automatic speech recognition
by: Yao, Zengwei, et al.
Published: (2023)
by: Yao, Zengwei, et al.
Published: (2023)
Self-consistent context aware conformer transducer for speech recognition
by: Kolokolov, Konstantin, et al.
Published: (2024)
by: Kolokolov, Konstantin, et al.
Published: (2024)
LLM-based phoneme-to-grapheme for phoneme-based speech recognition
by: Ma, Te, et al.
Published: (2025)
by: Ma, Te, et al.
Published: (2025)
Enhancing CTC-based speech recognition with diverse modeling units
by: Han, Shiyi, et al.
Published: (2024)
by: Han, Shiyi, et al.
Published: (2024)
An efficient text augmentation approach for contextualized Mandarin speech recognition
by: Zheng, Naijun, et al.
Published: (2024)
by: Zheng, Naijun, et al.
Published: (2024)
CR-CTC: Consistency regularization on CTC for improved speech recognition
by: Yao, Zengwei, et al.
Published: (2024)
by: Yao, Zengwei, et al.
Published: (2024)
Robustifying automatic speech recognition by extracting slowly varying features
by: Pizarro, Matías, et al.
Published: (2021)
by: Pizarro, Matías, et al.
Published: (2021)
Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder
by: Li, Xuyuan, et al.
Published: (2023)
by: Li, Xuyuan, et al.
Published: (2023)
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
by: Maiti, Soumi, et al.
Published: (2023)
by: Maiti, Soumi, et al.
Published: (2023)
Training dynamic models using early exits for automatic speech recognition on resource-constrained devices
by: Wright, George August, et al.
Published: (2023)
by: Wright, George August, et al.
Published: (2023)
Audio Spotforming Using Nonnegative Tensor Factorization with Attractor-Based Regularization
by: Ayano, Shoma, et al.
Published: (2024)
by: Ayano, Shoma, et al.
Published: (2024)
Late fusion ensembles for speech recognition on diverse input audio representations
by: Jezidžić, Marin, et al.
Published: (2024)
by: Jezidžić, Marin, et al.
Published: (2024)
Convoifilter: A case study of doing cocktail party speech recognition
by: Nguyen, Thai-Binh, et al.
Published: (2023)
by: Nguyen, Thai-Binh, et al.
Published: (2023)
TTS-CtrlNet: Time varying emotion aligned text-to-speech generation with ControlNet
by: Jeong, Jaeseok, et al.
Published: (2025)
by: Jeong, Jaeseok, et al.
Published: (2025)
Similar Items
-
Graph-based multi-Feature fusion method for speech emotion recognition
by: Liu, Xueyu, et al.
Published: (2024) -
Robust fine-tuning of speech recognition models via model merging: application to disordered speech
by: Ducorroy, Alexandre, et al.
Published: (2025) -
asr_eval: Algorithms and tools for multi-reference and streaming speech recognition evaluation
by: Sedukhin, Oleg, et al.
Published: (2026) -
Language model integration based on memory control for sequence to sequence speech recognition
by: Cho, Jaejin, et al.
Published: (2018) -
Phoneme-based speech recognition driven by large language models and sampling marginalization
by: Ma, Te, et al.
Published: (2025)