Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.22964 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866918108058353664 |
|---|---|
| author | Leang, Sotheara Castelli, Éric Vaufreydaz, Dominique Sam, Sethserey |
| author_facet | Leang, Sotheara Castelli, Éric Vaufreydaz, Dominique Sam, Sethserey |
| contents | The dynamic characteristics of speech signal provides temporal information and play an important role in enhancing Automatic Speech Recognition (ASR). In this work, we characterized the acoustic transitions in a ratio plane of Spectral Subband Centroid Frequencies (SSCFs) using polar parameters to capture the dynamic characteristics of the speech and minimize spectral variation. These dynamic parameters were combined with Mel-Frequency Cepstral Coefficients (MFCCs) in Vietnamese ASR to capture more detailed spectral information. The SSCF0 was used as a pseudo-feature for the fundamental frequency (F0) to describe the tonal information robustly. The findings showed that the proposed parameters significantly reduce word error rates and exhibit greater gender independence than the baseline MFCCs. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2507_22964 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Exploring Dynamic Parameters for Vietnamese Gender-Independent ASR Leang, Sotheara Castelli, Éric Vaufreydaz, Dominique Sam, Sethserey Audio and Speech Processing Computation and Language Sound Signal Processing The dynamic characteristics of speech signal provides temporal information and play an important role in enhancing Automatic Speech Recognition (ASR). In this work, we characterized the acoustic transitions in a ratio plane of Spectral Subband Centroid Frequencies (SSCFs) using polar parameters to capture the dynamic characteristics of the speech and minimize spectral variation. These dynamic parameters were combined with Mel-Frequency Cepstral Coefficients (MFCCs) in Vietnamese ASR to capture more detailed spectral information. The SSCF0 was used as a pseudo-feature for the fundamental frequency (F0) to describe the tonal information robustly. The findings showed that the proposed parameters significantly reduce word error rates and exhibit greater gender independence than the baseline MFCCs. |
| title | Exploring Dynamic Parameters for Vietnamese Gender-Independent ASR |
| topic | Audio and Speech Processing Computation and Language Sound Signal Processing |
| url | https://arxiv.org/abs/2507.22964 |