Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Leang, Sotheara, Castelli, Éric, Vaufreydaz, Dominique, Sam, Sethserey
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Computation and Language Sound Signal Processing
Online Access:	https://arxiv.org/abs/2507.22964
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918108058353664
author	Leang, Sotheara Castelli, Éric Vaufreydaz, Dominique Sam, Sethserey
author_facet	Leang, Sotheara Castelli, Éric Vaufreydaz, Dominique Sam, Sethserey
contents	The dynamic characteristics of speech signal provides temporal information and play an important role in enhancing Automatic Speech Recognition (ASR). In this work, we characterized the acoustic transitions in a ratio plane of Spectral Subband Centroid Frequencies (SSCFs) using polar parameters to capture the dynamic characteristics of the speech and minimize spectral variation. These dynamic parameters were combined with Mel-Frequency Cepstral Coefficients (MFCCs) in Vietnamese ASR to capture more detailed spectral information. The SSCF0 was used as a pseudo-feature for the fundamental frequency (F0) to describe the tonal information robustly. The findings showed that the proposed parameters significantly reduce word error rates and exhibit greater gender independence than the baseline MFCCs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_22964
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Exploring Dynamic Parameters for Vietnamese Gender-Independent ASR Leang, Sotheara Castelli, Éric Vaufreydaz, Dominique Sam, Sethserey Audio and Speech Processing Computation and Language Sound Signal Processing The dynamic characteristics of speech signal provides temporal information and play an important role in enhancing Automatic Speech Recognition (ASR). In this work, we characterized the acoustic transitions in a ratio plane of Spectral Subband Centroid Frequencies (SSCFs) using polar parameters to capture the dynamic characteristics of the speech and minimize spectral variation. These dynamic parameters were combined with Mel-Frequency Cepstral Coefficients (MFCCs) in Vietnamese ASR to capture more detailed spectral information. The SSCF0 was used as a pseudo-feature for the fundamental frequency (F0) to describe the tonal information robustly. The findings showed that the proposed parameters significantly reduce word error rates and exhibit greater gender independence than the baseline MFCCs.
title	Exploring Dynamic Parameters for Vietnamese Gender-Independent ASR
topic	Audio and Speech Processing Computation and Language Sound Signal Processing
url	https://arxiv.org/abs/2507.22964

Similar Items