Speech Style Modeling Method Using Mutual Information for End-to-End Speech Synthesis 


Vol. 44,  No. 9, pp. 1641-1647, Sep.  2019
10.7840/kics.2019.44.9.1641


PDF Full-Text
  Abstract

In this paper, we propose a novel style modeling method using mutual information(MI) for end-to-end speech synthesis. MI is applied to increase target style information and suppress text information in style embedding by adding MI loss term in objective function. To estimate MI using neural networks, we adopt mutual information neural estimator (MINE). The proposed method was trained using VCTK database and shown to outperform the conventional Tacotron based Global Style Token method in both speech quality and style similarity.

  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Cite this article

[IEEE Style]

J. Y. Lee, S. J. Cheon, B. J. Choi, N. S. Kim, D. H. Hong, "Speech Style Modeling Method Using Mutual Information for End-to-End Speech Synthesis," The Journal of Korean Institute of Communications and Information Sciences, vol. 44, no. 9, pp. 1641-1647, 2019. DOI: 10.7840/kics.2019.44.9.1641.

[ACM Style]

Joun Yeop Lee, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim, and Doo Hwa Hong. 2019. Speech Style Modeling Method Using Mutual Information for End-to-End Speech Synthesis. The Journal of Korean Institute of Communications and Information Sciences, 44, 9, (2019), 1641-1647. DOI: 10.7840/kics.2019.44.9.1641.

[KICS Style]

Joun Yeop Lee, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim, Doo Hwa Hong, "Speech Style Modeling Method Using Mutual Information for End-to-End Speech Synthesis," The Journal of Korean Institute of Communications and Information Sciences, vol. 44, no. 9, pp. 1641-1647, 9. 2019. (https://doi.org/10.7840/kics.2019.44.9.1641)