Performance Improvements for Silence Feature Normalization Method by Using Filter Bank Energy Subtraction 


Vol. 35,  No. 7, pp. 604-610, Jul.  2010


PDF
  Abstract

In this paper we proposed FSFN (Filter bank sub-band energy subtraction based CLSFN) method to improve the recognition performance of the existing CLSFN (Cepstral distance and Log-energy based Silence Feature Normalization). The proposed FSFN reduces the energy of noise components in filter bank sub-band domain when extracting the features from speech data. This leads to extract the enhanced cepstral features and thus improves the accuracy of speech/silence classification using the enhanced cepstral features. Therefore, it can be expected to get improved performance comparing with the existing CLSFN. Experimental results conducted on Aurora 2.0 DB showed that our proposed FSFN method improves the averaged word accuracy of 2% comparing with the conventional CLSFN method, and FSFN combined with CMVN (Cepstral Mean and Variance Normalization) also showed the best recognition performance comparing with others.

  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Cite this article

[IEEE Style]

G. Shen, S. Choi, H. Chung, "Performance Improvements for Silence Feature Normalization Method by Using Filter Bank Energy Subtraction," The Journal of Korean Institute of Communications and Information Sciences, vol. 35, no. 7, pp. 604-610, 2010. DOI: .

[ACM Style]

Guanghu Shen, Sook-Nam Choi, and Hyun-Yeol Chung. 2010. Performance Improvements for Silence Feature Normalization Method by Using Filter Bank Energy Subtraction. The Journal of Korean Institute of Communications and Information Sciences, 35, 7, (2010), 604-610. DOI: .

[KICS Style]

Guanghu Shen, Sook-Nam Choi, Hyun-Yeol Chung, "Performance Improvements for Silence Feature Normalization Method by Using Filter Bank Energy Subtraction," The Journal of Korean Institute of Communications and Information Sciences, vol. 35, no. 7, pp. 604-610, 7. 2010.