IndexFiguresTables |
Beomseo Choi♦ , Hongjun Kim* and Seung Hyun Jeon°LSTM-Based Time Series Forecasting of Pulmonary Function Test for COPD Early DiagnosisAbstract: Chronic Obstructive Pulmonary Disease (COPD) is a serious lung disease that makes breathing difficult and cannot be easily detected. Even though early diagnosis technology for COPD using machine learning has been developed, Pulmonary Function Test (PFT) data-based time series prediction studies are still lacking. We use PFT data with insufficient measurement intervals, propose a Long Short-Term Memory (LSTM) to predict PFT values for the future 1Q from the past 2Q, and classify whether COPD occurs or not. The data were interpolated to resolve the imbalanced time period. To confirm the validity of the augmented data, Multivariate Analysis of Variance (MANOVA) was performed, and through the rigorous MANOVA, we proved that there was no significant difference between the original and interpolated data. Mean Absolute Percentage Error (MAPE), recalls, and F1 scores, which are the harmonic mean of precision and recall for classification, were measured for two test scenarios: only the original data and the augmented data. Finally, we found the interpolated data decreased MAPE by almost 7%, however, improved recall and F1 score by almost 22% and 12% for obstructive pulmonary disease, compared with the original data. Besides, we can predict COPD within 3 months, irrelevant to smokers and non-smoker Keywords: Chronic Obstructive Pulmonary Disease , Early Diagnosis , Pulmonary Function Test , Long Short-Term Memory , Interpolati Ⅰ. IntroductionChronic Obstructive Pulmonary Disease (COPD) is an irreversible chronic lung disease that narrows the airways over a long period of time. According to the World Health Organization (WHO), COPD is the third leading cause of death globally in 2019 and is responsible for approximately 6% of all deaths. COPD is rare in low-income countries, but it ranks in the top five in all foreign countries. COPD is caused by several risk factors, including exposure to smoking and air pollution. COPD cannot be cured in a short period of time. Early diagnosis of the disease and prompt treatment play an important role in reducing mortality due to COPD. Early symptoms of COPD include chronic cough and phlegm, fatigue, and shortness of breath. The problem with early diagnosis of COPDis that it is difficult to detect because the early symptoms are not clear. In modern times, chest X-ray, Pulmonary Function Test (PFT), Chest Computed Tomography (CT), and Arterial Blood Gas Analysis (ABGA) are used to diagnose COPD[1]. Among them, PFT can easily obtain data in terms of low inspection cost and short inspection time. However, in the prior research for early diagnosis of COPD, authors mainly conducted image analysis such as CT. H. Park et al conducted a study predicting spirometry from CT images. They classified high-risk participants by spirometry values and used a Convolutional Neural Network (CNN)[2]. Although classification research through machine learning is active, time series analysis research using PFT time series data is still lacking. We use intermittent PFT time series data from multiple tentative patients to predict Forced Vital Capacity (FVC) and Forced Expiratory Volume in 1 second ([TeX:] $$$$FEV_1) and types of ventilatory disorders. However, an imbalance of measurement intervals per patient should be solved for time series analysis. In this paper, we propose a Long Short-Term Memory (LSTM) based COPD prediction framework to diagnose the patient’s COPD within the next 1 quarter (Q). First, to solve the problem of the data observation interval’s inconsistency, the training data were downsampled based on a Q unit using preprocessing and augmented with fill and interpolation. We test the validity of the augmented data using Multivariate Analysis of Variance (MANOVA), and then confirm that there was no difference between the augmented and the original test data. To verify the MANOVA results, two scenarios with the augmented and original test data are presented. We predict the future 1Q based on the well-refined training data during the past 2Q. Thus, we improve the performance of the augmented version such as Mean Absolute Percentage Error (MAPE) and F1 score as follows: 7% reduction and 4% enhancement, respectively. Among ventilatory disorders, in the case of obstructive, recall and F1 score for the augmented test data improved by 22% and 12%, respectively. Ⅱ. Related WorksA lot of research has been conducted to diagnose lung disease using artificial intelligence (AI) technologies[3]. In this section, we describe the previous research on machine learning and deep learning related to COPD, as well as interpolation for data preprocessing. 2.1 AI Approaches to Predict COPD DiagnosisL. Beverin et al. showed high performance in predicting lung disease based on machine learning using PFT data. The authors predicted Total Lung Capacity (TLC) using Random Forest (RF)[4]. The study used PFT data. As a result of the study, the sensitivity, specificity, and F1 score of the algorithm predicting restrictive ventilatory impairment were 83, 92, and 75%, respectively. This study uses similar features to ours. However, they cannot predict COPD using the proposed RF model. However, D. Spathis and P. Vlamos classified COPD using RF[5], which shows a precision of 97.7%. The authors used PFT and ABGA data. This paper revealed that smoking, FVC, [TeX:] $$$$FEV_1, and age are important factors for COPD through the feature importance of RF. They can predict COPD at the time the patient is tested. However, even if the patient does not have COPD at the time of measurement, COPD may appear in the future if the patient’s condition is worsening. Since we use PFT time series data, the nearest future onset of COPD can be predicted by considering changes in the patient’s condition. There are studies using the LSTM model to study early diagnosis of COPD. V. Nunavath et al. predicted the health status of COPD patients[6]. The authors used an LSTM model based on ABGA data. The LSTM model was learned based on data during the past 5 days and showed an accuracy of 84.12% in predicting the patient’s health status one day in advance. This approach does not distinguish whether patients have COPD or not. D. Perna and A. Tagarelli proposed a learning framework using respiratory sound data and Recurrent Neural Network (RNN)-based LSTM, Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), and Bidirectional GRU (BiGRU)[7]. Among the four models, LSTM consistently showed better. Thus, we consider choosing the LSTM-based COPD research. 2.2 Interpolation Approaches to Augment Insufficient COPD DataRecent research has improved performance through interpolation in insufficient situations of input data. O. O. Abayomi-Alli et al. used biomedical voice measurement, the Oxford Parkinson Disease dataset, and BiLSTM for early detection of Parkinson's disease[8]. The research presents interpolation to augment the small dataset. The interpolation methods used were cubic spline and Piecewise Cubic Hermite Interpolating Polynomial (PCHIP). PCHIP creates a cubic Hermite interpolating polynomial from data points in the data interval. Each piece is monotonic and is characterized by smoothly connecting data points between data intervals[9]. O. O. Abayomi-Alli et al. argue that the main limitation of interpolation is that it produces out-of-range and noisy data. In this paper, instead of using cubic spline interpolation, linear interpolation and PCHIP interpolation were used to solve the problem, because linear interpolation and PCHIP interpolation are both monotonic. H. Watz et al. used interpolation for statistical analysis of COPD[10]. The authors used data from patients who completed lung capacity measurements daily for 56 days, at least once per week, for COPD postmortem analysis. Missing values in the data were filled in using the linear interpolation, fill, and carry forward methods. [TeX:] $$$$FEV_1 continuously and smoothly decreases as life continues[11]. Thus, two interpolations are presented in this paper. The first is linear interpolation, and the second is PCHIP. Cubic spline interpolation is not considered in this paper because negative numbers may occur. Since [TeX:] $$$$FEV_1 does not shake, it cannot be negative, and shows a gradual pattern, we adopt linear interpolation and PCHIP, which are monotonic interpolations. Ⅲ. System Model3.1 Observed DataThe dataset was provided by Chungnam National University Hospital and collected between January 1, 2020, and July 31, 2022. Table 1 shows the features of the proposed framework. One-hot encoding was performed for sex. PFT proceeds with three stages: inhale with maximum effort, exhale with maximum effort, and breathe in again with maximum effort. During PFT, flow and volume are measured and expressed as a time-volume curve and volume-flow curve. The two curves show FVC, [TeX:] $$FEV_1$$, Forced Expiratory Flow (FEF), and Peak Expiratory Flow (PEF). FVC refers to the volume of air when you inhale as much as possible and then exhale all the way with maximum effort. [TeX:] $$FEV_1$$ refers to the amount of air exhaled with maximum effort in 1 second after starting to inhale and exhale with maximum effort. [TeX:] $$FEF_{25~75%}$$ refers to the average airflow between 25% and 75% of FVC. PEF is the maximum airflow achieved during exhalation with maximal effort. FVC, [TeX:] $$FEV_1$$, and PEF are shown in Fig. 1 showing the time-volume curve. PEF and FEF are shown in Fig. 2 showing the volume-flow curve. FVC% and [TeX:] $$FEV_1$$/FVC can be used to distinguish types of ventilatory disorderss[12]. FVC% is FVC (measured)/FVC (predicted). [TeX:] $$FEV_1$$/FVC is [TeX:] $$FEV_1$$ (measured)/FVC (measured). Predictions are calculated by spirometric reference equations. We used Morris’s reference equation[13]. The criteria for classification of ventilation disorders are shown in Fig. 3. Restrictive is a symptom of decreased TLC. Therefore, it shows a decrease in FVC. Obstructive is a symptom of narrowing of the airway. Mixed shows both restrictive and obstructive symptoms. Table 1. The features of the proposed framework.
3.2 PreprocessingThis section describes a method for keeping the intervals of time series data constant and a method for handling missing values. The proposed framework is shown in Fig. 4. The dataset has different intervals because PFT data were obtained regardless of smokers and non-smokers. Accordingly, to solve this problem, we assume observations in Q units and perform downsampling at Q intervals. If the PFT frequency increases in the future, downsampling can be performed on a monthly basis shorter than quarterly. The fill method was used for the sex and pack year features because there is only one value in one patient. Linear interpolation does not reflect the characteristic of the age, which increases by one year with each birthday. However, since there was no information on the patient's birthday, the patient's birthday was assumed to be January 1st. Therefore, every January 1st, the age is increased by one year. Min-max normalization was performed. The number of the original and augmented data is 1,408 and 1,446, respectively. The split ratio of the original data and the augmented data is 49:51. We validate the use of augmented data to predict COPD. For the original data group and the augmented data group, pack year, age, height, weight, FVC, [TeX:] $$FEV_1, FEF_{25~75%}$$, and PEF are analyzed using MANOVA. We set alpha to 0.05, and the results are shown in Table 2. As a result of MANOVA analysis, the p-value was 0.1083, which is larger than the alpha value. The null hypothesis that “the overall vector averages of the two groups are the same” cannot be rejected, and then there is no significant difference between the two groups. To perform MANOVA, multivariate normality and multivariate homoscedasticity must be satisfied. If the absolute value of skewness is greater than 3 or the absolute value of kurtosis is greater than 10, there is a problem with normality[14]. The skewness of the original data is between approximately -1.22 and 1.45, and the kurtosis is approximately between -0.38 and 2.25. The skewness of the augmented data is between approximately -1.22 and 1.46, and the kurtosis is approximately between -0.50 and 3.16. Therefore, we can ensure that multivariate normality is satisfied. Table 2. The results of MANOVA for raw dataset and augmented dataset. Num DF is the Numerator Degrees of Freedom, Den DF is the Denominator Degrees of Freedom.
Box’s M Test was performed to test multivariate homoscedasticity. We set alpha to 0.001. The p-value of Box’s M Test is 0.02904, which is larger than the alpha value. Since the null hypothesis that variances between multivariate groups are equal cannot be rejected, homoscedasticity is satisfied. The detailed results of the Box’s M Test are shown in Table 3. Table 3. The results of Box’s M Test for raw dataset and augmented dataset.
3.3 Proposed LSTM-Based COPD Forecasting FrameworkWe aim to put in input data at time T and T-1 and to get output data at time T+1. In other words, we predict future 1Q data with past 2Q data. Fig. 5 shows an example of the preprocessing process. This paper downsamples irregular time series data. For data merged due to downsampling, the value of the last data is used. To extract samples to be used in LSTM, the sample size was set to 3, which is the sum of the past 2Q and the future 1Q, and sampling was performed by sliding. The entire observed data includes augmented data. To distinguish between augmented data and original data, if there are nomissing values in the sample, the starting index of the sample is stored in raw_index_list. Otherwise, it is stored in not_raw_index_list. And then the missing values of the downsampled data are filled by interpolation or filling. The example in Fig. 5 used linear interpolation. Then, sampling is performed using the index information in raw_index_list and not_raw_index_list. Here, we define two test scenarios and conduct experiments. · Augmented Test Data (ATD): Both augmented data and original data are used without distinction. 20% of the total is used as test data, and the remaining 80% is used as training and validation data. · Raw Test Data (RTD): the augmented data is used as training and validation data, and the original data is used as test data. Even though there was no significant difference between the original data and the augmented data, there was a difference in the samples of ATD and RTD. This difference arises because the method of recognizing raw samples during the sample extraction process is quite restricted. The number and ratio of samples are shown in Table 8 in the Appendix. We apply a stratified split scheme to split the data into training and validation data and test data in ATD. As RTD only uses raw data as test data, we cannot apply the stratified split for RTD. To split training and validation data, stratified K-fold cross-validation is performed. Stratified techniques can reduce bias when splitting data evenly or evaluating model performance. Here, we apply K=5, and the average evaluation of each fold was used as the result. MAPE was used as a regression metric. Accuracy, precision, recall, and F1 score were used as classification metrics. Mean Squared Error (MSE) was used as a loss metric.
where [TeX:] $$A_i$$ is an actual value, [TeX:] $$F_i$$ is a forecast value, and n is a sample size.
where TP is a true positive, FP is a false positive, TN is a true negative, and FN is a false negative.
F1 score, which is used as the harmonic mean of precision and recall for COPD classification, is expressed by (4) and (5).
Ⅳ. Experimental ResultsThis section summarizes the hyperparameters in Table 4 and presents test results of each scenario in Tables 5, 6, and 7. Table 5 shows the MAPE for each age group for the two test scenarios. The ATD showed a lower error compared to the RTD. The total average MAPE decreased by 7%. Here we describe MAPE with the lower error as a boldface type. Table 6 shows the accuracies by age group for the two test scenarios. The ATD showed higher accuracy compared to the RTD. Here, we describe higher accuracy as a boldface type. Table 4. Hyperparameters for ATD and RTD.
Table 5. MAPE by age groups and ventilatory disorders for ATD and RTD.
Table 6. Accuracy of classification by age groups for ATD and RTD.
Table 7 shows the precisions, recalls, and F1 scores of the two test scenarios. The ATD showed higher performance compared to the RTD. Here, we describe higher precisions, recalls, and F1 scores as a boldface type. In the medical field, recall is considered important. This is to reduce cases where actual positive patients are judged negative. Besides, medical data have severe imbalances between classes. Therefore, F1 scores are also effective. In classification, the recall and F1 score of obstructive increased by 22% and 12%, respectively. Through the results of the ATD and the RTD, the error was reduced when using augmented data as a test. In principle, it is correct to use only the original data as a test, but in the case of the proposed experiment, MANOVA confirmed that there was no difference between the two groups, and even though the measurement intervals of PFT data are unstable, we prove meaningful results can be sufficiently confirmed for COPD prediction even when augmented data is used. Table 7. Precisions, recalls, and F1 scores for ATD and RTD.
Compared to ATD, which includes augmented samples as test data, RTD has less test data. Therefore, the difference in the number of test data for the two scenarios led to different results. Additionally, because ATD has low epochs for train and many test data against RTD, and the distributions for training and validation data and test data are similar for each class, ATD achieves better performance than RTD. Ⅴ. ConclusionBy examining the outcomes from both the ATD and RTD, we have confirmed that interpolation data, whose availability has been verified by MANOVA for PFT time series data, are reliable and can lead to better performance in MAPE, recall, and F1 score. PFT is relatively simple and inexpensive compared to other tests for early diagnosis of COPD. However, because the health status and severity of each patient were different, the measurement intervals were not consistent. Besides, it was not easy to obtain sufficient PFT data to predict COPD. Nevertheless, by obtaining reliable COPD predictions in terms of recall and F1 scores through PFT data, interpolation can provide medical staff with reliable reference prediction results compared to traditional COPD prediction judgments through the naked eye or relying on expensive tests for seriously ill patients. Ⅵ. AppendixTable 8. The number and ratio of samples used in LSTM. A restrictive sample in the age group ‘30s’ was removed from the ATD. Because there was only one data, the stratified split could not be applied.
BiographyBiographySeung Hyun JeonFeb. 2017 : Ph.D. School of Electrical Engineering, KAIST, Korea Jul. 2018~Mar. 2023 : KT R&D Center, KT, Korea Mar. 2023~Current : Dept. Of Computer Engineering, Dae- jeon Univ. [Research Interests] Machine learning, blockchain networks, energy consumption models for networks. [ORCID:0000-0001-7303-4672] References
|
StatisticsCite this articleIEEE StyleB. Choi, H. Kim, S. H. Jeon, "LSTM-Based Time Series Forecasting of Pulmonary Function Test for COPD Early Diagnosis," The Journal of Korean Institute of Communications and Information Sciences, vol. 49, no. 3, pp. 346-355, 2024. DOI: 10.7840/kics.2024.49.3.346.
ACM Style Beomseo Choi, Hongjun Kim, and Seung Hyun Jeon. 2024. LSTM-Based Time Series Forecasting of Pulmonary Function Test for COPD Early Diagnosis. The Journal of Korean Institute of Communications and Information Sciences, 49, 3, (2024), 346-355. DOI: 10.7840/kics.2024.49.3.346.
KICS Style Beomseo Choi, Hongjun Kim, Seung Hyun Jeon, "LSTM-Based Time Series Forecasting of Pulmonary Function Test for COPD Early Diagnosis," The Journal of Korean Institute of Communications and Information Sciences, vol. 49, no. 3, pp. 346-355, 3. 2024. (https://doi.org/10.7840/kics.2024.49.3.346)
|