:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ishida, Shoma, Ono, Satoshi
Format:	Preprint
Published:	2020
Subjects:	Sound Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2012.11138
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Graph-based multi-Feature fusion method for speech emotion recognition
by: Liu, Xueyu, et al.
Published: (2024)

Robust fine-tuning of speech recognition models via model merging: application to disordered speech
by: Ducorroy, Alexandre, et al.
Published: (2025)

asr_eval: Algorithms and tools for multi-reference and streaming speech recognition evaluation
by: Sedukhin, Oleg, et al.
Published: (2026)

Language model integration based on memory control for sequence to sequence speech recognition
by: Cho, Jaejin, et al.
Published: (2018)

Phoneme-based speech recognition driven by large language models and sampling marginalization
by: Ma, Te, et al.
Published: (2025)

Improving child speech recognition with augmented child-like speech
by: Zhang, Yuanyuan, et al.
Published: (2024)

Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition
by: An, Keyu, et al.
Published: (2024)

Towards noise-robust speech inversion through multi-task learning with speech enhancement
by: Tabatabaee, Saba, et al.
Published: (2026)

AEROMamba: An efficient architecture for audio super-resolution using generative adversarial networks and state space models
by: Abreu, Wallace, et al.
Published: (2024)

Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition
by: Zhang, Yiru, et al.
Published: (2025)

Charting 15 years of progress in deep learning for speech emotion recognition: A replication study
by: Triantafyllopoulos, Andreas, et al.
Published: (2025)

PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers
by: Pandey, Rahul, et al.
Published: (2023)

Explainable speech emotion recognition through attentive pooling: insights from attention-based temporal localization
by: Leygue, Tahitoa, et al.
Published: (2025)

Automated speech audiometry: Can it work using open-source pre-trained Kaldi-NL automatic speech recognition?
by: Araiza-Illan, Gloria, et al.
Published: (2023)

Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus
by: Chen, Szu-Jui, et al.
Published: (2026)

Deep learning-based filtering of cross-spectral matrices using generative adversarial networks
by: Puhle, Christof
Published: (2025)

Heterogeneous bimodal attention fusion for speech emotion recognition
by: Luo, Jiachen, et al.
Published: (2025)

LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition
by: Kwak, Doyeop, et al.
Published: (2026)

Introduction to speech recognition
by: Dauphin, Gabriel
Published: (2024)

AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
by: Gong, Rong, et al.
Published: (2024)

Fusion approaches for emotion recognition from speech using acoustic and text-based features
by: Pepino, Leonardo, et al.
Published: (2024)

Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet
by: Dhakal, Manish, et al.
Published: (2024)

learning discriminative features from spectrograms using center loss for speech emotion recognition
by: Dai, Dongyang, et al.
Published: (2025)

Prosodic Parameter Manipulation in TTS generated speech for Controlled Speech Generation
by: Chary, Podakanti Satyajith
Published: (2024)

Adversarial speech for voice privacy protection from Personalized Speech generation
by: Chen, Shihao, et al.
Published: (2024)

Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition
by: Dong, Lukuang, et al.
Published: (2026)

Zipformer: A faster and better encoder for automatic speech recognition
by: Yao, Zengwei, et al.
Published: (2023)

Self-consistent context aware conformer transducer for speech recognition
by: Kolokolov, Konstantin, et al.
Published: (2024)

LLM-based phoneme-to-grapheme for phoneme-based speech recognition
by: Ma, Te, et al.
Published: (2025)

Enhancing CTC-based speech recognition with diverse modeling units
by: Han, Shiyi, et al.
Published: (2024)

An efficient text augmentation approach for contextualized Mandarin speech recognition
by: Zheng, Naijun, et al.
Published: (2024)

CR-CTC: Consistency regularization on CTC for improved speech recognition
by: Yao, Zengwei, et al.
Published: (2024)

Robustifying automatic speech recognition by extracting slowly varying features
by: Pizarro, Matías, et al.
Published: (2021)

Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder
by: Li, Xuyuan, et al.
Published: (2023)

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
by: Maiti, Soumi, et al.
Published: (2023)

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices
by: Wright, George August, et al.
Published: (2023)

Audio Spotforming Using Nonnegative Tensor Factorization with Attractor-Based Regularization
by: Ayano, Shoma, et al.
Published: (2024)

Late fusion ensembles for speech recognition on diverse input audio representations
by: Jezidžić, Marin, et al.
Published: (2024)

Convoifilter: A case study of doing cocktail party speech recognition
by: Nguyen, Thai-Binh, et al.
Published: (2023)

TTS-CtrlNet: Time varying emotion aligned text-to-speech generation with ControlNet
by: Jeong, Jaeseok, et al.
Published: (2025)