Implementation of Federated Learning Using Probabilistic Sampling Techniques Based on Data Distribution Estimation to Solve Statistical Heterogeneity Problems 


Vol. 46,  No. 11, pp. 1941-1949, Nov.  2021
10.7840/kics.2021.46.11.1941


PDF Full-Text
  Abstract

The statistical heterogeneity means that data collected from devices, dynamic environments, time and space, used by a number of users participating in Federated Learning(FL) does not satisfy the IID (Independently Distributed) condition and shows an unbalanced distribution(Non-Independently Distributed). In this paper, we estimate global data distribution based on local data distribution, propose and implement a process that perform data sampling stochastically, and compare the performance to solve the statistical heterogeneity problem of FL. We estimate The distribution of total data through the distribution of local data without the direct access about local data. Then we adjust the distribution of local data. We implement process functions in the open-source framework and Train classification model using MNIST(Modified National Institute of Standards and Technology database) data. After experimenting basic FL and FL with sampling techniques proposed in this study which performed up to 100 rounds in various environments, we compare the performance. As a result, the accuracy of 0.89-0.91 and the accuracy of AUROC(Area Under the Receiver Operating Characteristic Curve) of 0.98-0.99 showed similar performance, but the learning time per round was reduced by about 9% The performance heterogeneity between local clients decreased by about 1.5%.

  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Cite this article

[IEEE Style]

S. U. Kim, H. Lee, J. Bang, S. E. Hong, H. J. Kim, "Implementation of Federated Learning Using Probabilistic Sampling Techniques Based on Data Distribution Estimation to Solve Statistical Heterogeneity Problems," The Journal of Korean Institute of Communications and Information Sciences, vol. 46, no. 11, pp. 1941-1949, 2021. DOI: 10.7840/kics.2021.46.11.1941.

[ACM Style]

Seon Uk Kim, Hyeonsu Lee, Junil Bang, Sung Eon Hong, and Hwa Jong Kim. 2021. Implementation of Federated Learning Using Probabilistic Sampling Techniques Based on Data Distribution Estimation to Solve Statistical Heterogeneity Problems. The Journal of Korean Institute of Communications and Information Sciences, 46, 11, (2021), 1941-1949. DOI: 10.7840/kics.2021.46.11.1941.

[KICS Style]

Seon Uk Kim, Hyeonsu Lee, Junil Bang, Sung Eon Hong, Hwa Jong Kim, "Implementation of Federated Learning Using Probabilistic Sampling Techniques Based on Data Distribution Estimation to Solve Statistical Heterogeneity Problems," The Journal of Korean Institute of Communications and Information Sciences, vol. 46, no. 11, pp. 1941-1949, 11. 2021. (https://doi.org/10.7840/kics.2021.46.11.1941)