Monaural Music-Speech Source Separation Based on Convolutional Neural Network for Background Music Identification in TV Shows 


Vol. 45,  No. 5, pp. 855-866, May  2020
10.7840/kics.2020.45.5.855


PDF
  Abstract

Music identification technology has a relatively high technical maturity in the case of the clean music input. However, its performance is drastically reduced when the background music is mixed with speech like TV shows. U-Net, Wave-U-Net, and MMDenseNet and modify them for music-speech separation. Also, we propose Wave-DenseNet which is a waveform input method with DenseNet. A landmark based audio fingerprint method is used for music identification. Although SDR is widely used as a measure of source separation, we confirmed it is not a suitable measure for music identification with separated music signal after source separation as the performance rankings of SDR and the identification rate are different. Comparing the background music identification results, we show the best music-speech separation method for the background music identification is the Wave-U-Net based separation method.

  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Cite this article

[IEEE Style]

H. Kimw, W. Heo, J. Kim, J. Park, "Monaural Music-Speech Source Separation Based on Convolutional Neural Network for Background Music Identification in TV Shows," The Journal of Korean Institute of Communications and Information Sciences, vol. 45, no. 5, pp. 855-866, 2020. DOI: 10.7840/kics.2020.45.5.855.

[ACM Style]

Hyemi Kimw, Woon-Haeng Heo, Junghyun Kim, and Jihyun Park. 2020. Monaural Music-Speech Source Separation Based on Convolutional Neural Network for Background Music Identification in TV Shows. The Journal of Korean Institute of Communications and Information Sciences, 45, 5, (2020), 855-866. DOI: 10.7840/kics.2020.45.5.855.

[KICS Style]

Hyemi Kimw, Woon-Haeng Heo, Junghyun Kim, Jihyun Park, "Monaural Music-Speech Source Separation Based on Convolutional Neural Network for Background Music Identification in TV Shows," The Journal of Korean Institute of Communications and Information Sciences, vol. 45, no. 5, pp. 855-866, 5. 2020. (https://doi.org/10.7840/kics.2020.45.5.855)