Saved in:
Bibliographic Details
Main Authors: Leang, Sotheara, Castelli, Éric, Vaufreydaz, Dominique, Sam, Sethserey
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.22964
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918108058353664
author Leang, Sotheara
Castelli, Éric
Vaufreydaz, Dominique
Sam, Sethserey
author_facet Leang, Sotheara
Castelli, Éric
Vaufreydaz, Dominique
Sam, Sethserey
contents The dynamic characteristics of speech signal provides temporal information and play an important role in enhancing Automatic Speech Recognition (ASR). In this work, we characterized the acoustic transitions in a ratio plane of Spectral Subband Centroid Frequencies (SSCFs) using polar parameters to capture the dynamic characteristics of the speech and minimize spectral variation. These dynamic parameters were combined with Mel-Frequency Cepstral Coefficients (MFCCs) in Vietnamese ASR to capture more detailed spectral information. The SSCF0 was used as a pseudo-feature for the fundamental frequency (F0) to describe the tonal information robustly. The findings showed that the proposed parameters significantly reduce word error rates and exhibit greater gender independence than the baseline MFCCs.
format Preprint
id arxiv_https___arxiv_org_abs_2507_22964
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Exploring Dynamic Parameters for Vietnamese Gender-Independent ASR
Leang, Sotheara
Castelli, Éric
Vaufreydaz, Dominique
Sam, Sethserey
Audio and Speech Processing
Computation and Language
Sound
Signal Processing
The dynamic characteristics of speech signal provides temporal information and play an important role in enhancing Automatic Speech Recognition (ASR). In this work, we characterized the acoustic transitions in a ratio plane of Spectral Subband Centroid Frequencies (SSCFs) using polar parameters to capture the dynamic characteristics of the speech and minimize spectral variation. These dynamic parameters were combined with Mel-Frequency Cepstral Coefficients (MFCCs) in Vietnamese ASR to capture more detailed spectral information. The SSCF0 was used as a pseudo-feature for the fundamental frequency (F0) to describe the tonal information robustly. The findings showed that the proposed parameters significantly reduce word error rates and exhibit greater gender independence than the baseline MFCCs.
title Exploring Dynamic Parameters for Vietnamese Gender-Independent ASR
topic Audio and Speech Processing
Computation and Language
Sound
Signal Processing
url https://arxiv.org/abs/2507.22964