Saved in:
| Main Authors: | , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.18636 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866917850848952320 |
|---|---|
| author | Kapu, Nirmal Joshua Karan, Raghav |
| author_facet | Kapu, Nirmal Joshua Karan, Raghav |
| contents | This article surveys convolution-based models including convolutional neural networks (CNNs), Conformers, ResNets, and CRNNs-as speech signal processing models and provide their statistical backgrounds and speech recognition, speaker identification, emotion recognition, and speech enhancement applications. Through comparative training cost assessment, model size, accuracy and speed assessment, we compare the strengths and weaknesses of each model, identify potential errors and propose avenues for further research, emphasizing the central role it plays in advancing applications of speech technologies. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2411_18636 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Towards Advanced Speech Signal Processing: A Statistical Perspective on Convolution-Based Architectures and its Applications Kapu, Nirmal Joshua Karan, Raghav Sound Artificial Intelligence Computation and Language Audio and Speech Processing This article surveys convolution-based models including convolutional neural networks (CNNs), Conformers, ResNets, and CRNNs-as speech signal processing models and provide their statistical backgrounds and speech recognition, speaker identification, emotion recognition, and speech enhancement applications. Through comparative training cost assessment, model size, accuracy and speed assessment, we compare the strengths and weaknesses of each model, identify potential errors and propose avenues for further research, emphasizing the central role it plays in advancing applications of speech technologies. |
| title | Towards Advanced Speech Signal Processing: A Statistical Perspective on Convolution-Based Architectures and its Applications |
| topic | Sound Artificial Intelligence Computation and Language Audio and Speech Processing |
| url | https://arxiv.org/abs/2411.18636 |