Preview

Medical Doctor and Information Technologies

Advanced search

Comparative analysis of data synthesis methods in the task of predicting atrial fibrillation and in-hospital mortality in patients with coronary heart disease after coronary artery bypass grafting

https://doi.org/10.25881/18110193_2024_4_28

Abstract

The aim of the study was to evaluate the performance of SMOTE, GAN and VAE data synthesis methods in the task of predicting postoperative atrial fibrillation (PoAF) and in-hospital mortality (IHM) in coronary heart disease (CH) patients after coronary artery bypass grafting (CABG).
Materials and methods. A single-center retrospective study was conducted, in which the medical history data of 999 patients with CHD undergoing elective CABG were analyzed. The end points of the study were PoAF and IHM. Development of predictive models was performed using machine learning methods: multivariate logistic regression (MLR), random forest (RF) and eXtreme Gradient Boosting (XGB). Nine data synthesis methods were used to generate new minority class samples: 5 SMOTE group methods, SOMO, GAN, WGAN and VAE methods.
Results. Comparison of quality criteria for the predictive models of PoAF and IHM, developed on the basis of real and synthetic data, showed that for the MLR and RF models, the use of synthetic objects was not associated with an increase in prediction accuracy. When using the XGB method to solve IHM prediction problem, in which the majority class volume was dominant (15 to 1), only the ProWRAS method was associated with an increase in prediction quality. When class imbalance is not significant (4 to 1), which corresponds to the PoAF end point, the use of data synthesis methods does not improve prediction quality.
Conclusion. The use of SMOTE, GAN and VAE methods does not guarantee an improvement in the accuracy of predictive models for PoAF and IHM in CHD patients after CABG

About the Authors

K. I. Shakhgeldyan
Vladivostok State University
Russian Federation

DSc., Prof



V. V. Kosterin
Vladivostok State University
Russian Federation


V. Yu. Rublev
Vladivostok State University; Far Eastern Federal University
Russian Federation


B. I. Geltser
Far Eastern Federal University; Vladivostok State University
Russian Federation

DSc., Prof, corresponding member RAS



References

1. May M. Eight ways machine learning is assisting medicine. Nature Medicine. 2021; 27: 2-3. doi: 10.1038/s41591-020-01197-2.

2. Johnson KW, Torres Soto J, Glicksberg BS, Shameer K, Miotto R, Ali M, Ashley E, Dudley JT. Artificial Intelligence in Cardiology. Journal of the American College of Cardiology. 2018; 71(23): 2668-79. doi: 10.1016/j.jacc.2018.03.521.

3. Arnett DK, Blumenthal RS, Albert MA, et al. 2019 ACC/AHA Guideline on the Primary Prevention of Cardiovascular Disease. Circulation. 2019; 140(11): e596-e646. doi: 10.1161/CIR.0000000000000678.

4. Li Y. Diagnostic Model of In-Hospital Mortality in Patients with Acute ST-Segment Elevation Myocardial Infarction Used Artificial Intelligence Methods. Cardiology Research and Practice. 2022; 2022: 8758617. doi: 10.1155/2022/8758617.

5. Khalaji A, Behnoush AH, Jameie M, et al. Machine learning algorithms for predicting mortality after coronary artery bypass grafting. Frontiers in Cardiovascular Medicine. 2022; 9. doi: 10.3389/fcvm.2022.977747.

6. Li D, Liu C, Hu S. A learning method for the class imbalance problem with medical data sets. Computers in Biology and Medicine. 2010; 40(5): 509-518. doi: 10.1016/j.compbiomed.2010.03.005.

7. Guo X, Yin Y, Dong C, et al. On the class imbalance problem. Proceedings of the 4th International Conference on Natural Computation. Jinan: IEEE. 2008: 192-201. doi: 10.1109/ICNC.2008.871.

8. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. 2002; 16: 321-357. doi: 10.1613/jair.953.

9. Singh NK, Raza K. Medical Image Generation Using Generative Adversarial Networks: A Review. In: Patgiri R, Biswas A, Roy P (eds) Health Informatics: A Computational Perspective in Healthcare. Studies in Computational Intelligence. Springer, Singapore. 2021. doi: 10.1007/978-981-15-9735-0_5.

10. Pinheiro Cinelli L, Araújo Marins M, Barros da Silva EA, Lima Netto S. Variational Autoencoder. In: Variational Methods for Machine Learning with Applications to Deep Networks. Springer, Cham; 2021. doi: 10.1007/978-3-030-70679-1_5.

11. Alam T, Shaukat K, Hameed I, et al. A novel framework for prognostic factors identification of malignant mesothelioma through association rule mining. Biomedical Signal Processing and Control. 2021; 68. doi: 10.1016/j.bspc.2021.102726.

12. Ahsan MM, Siddique Z. Machine learning-based heart disease diagnosis: A systematic literature review. Artificial Intelligence in Medicine. 2022;128. doi: 10,1016/j.artmed.2022.102289.

13. Waljee AK, Wallace BI, Cohen-Mekelburg S, et al. Development and Validation of Machine Learning Models in Prediction of Remission in Patients With Moderate to Severe Crohn Disease. JAMA Network Open. 2019; 2(5). doi: 10.1001/jamanetworkopen.2019.3721.

14. Geltser BI, Shakhgeldyan KI, Rublev VYu, Shcheglov BO, Kokarev EA. Algorithm for selecting predictors and prognosis of atrial fibrillation in patients with coronary artery disease after coronary artery bypass grafting. Russian Journal of Cardiology. 2021; 26(7): 4522. (In Russ.) doi:10.15829/1560-4071-2021-4522.

15. Shakhgeldyan K, Geltser D, Kriger A, Geltser B, Rublev V, Shirobokov B. Feature selection strategy for intrahospital mortality prediction after coronary artery bypass graft surgery on an unbalanced sample. ACM International Conference Proceeding Series. Vol. 4. Proceedings of the 4th International Conference on Computer Science and Application Engineering, CSAE 2020, 2020; 108. doi: 10.1145/3424978.3425090.

16. Zhang Q, Wang H, Lu H, Won D, Yoon SW. Medical Image Synthesis with Generative Adversarial Networks for Tissue Recognition. In: 2018 IEEE International Conference on Healthcare Informatics. 2018: 199-207. doi: 10.1109/ICHI.2018.00030.

17. Albert AJ, Murugan R, Sripriya T. Diagnosis of heart disease using oversampling methods and decision tree classifier in cardiology. Research in Biomedical Engineering. 2023; 39: 99-113. doi: 10.1007/s42600-022-00253-9.

18. Gazzah S, Essoukri N. New Oversampling Approaches Based on Polynomial Fitting for Imbalanced Data Sets. In: The 8th IAPR Workshop on Document Analysis. Nara: DAS. 2008: 677-684. doi: 10.1109/DAS.2008.74.

19. Ma L, Fan S. CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC Bioinformatics. 2017; 18(1). doi: 10.1186/s12859-017-1578-z.

20. Barua S, Islam M, Murase K. ProWSyn: Proximity Weighted Synthetic Oversampling Technique for Imbalanced Data Set Learning. In: Advances in Knowledge Discovery and Data Mining. Heidelberg: Springer-Verlag; 2013. pp. 317-328. doi: 10.1007/978-3-642-37456-2_27.

21. Bej S, Schulz K, Srivastava P, et al. A Multi-Schematic Classifier-Independent Oversampling Approach for Imbalanced Datasets. IEEE Access. 2021; 9: 123358-123374. doi: 10.1109/ACCESS.2021.3108450.

22. Douzas G, Bacao F. Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning. Expert Systems with Applications. 2017; 82: 40-52. doi: 10.1016/j.eswa.2017.03.073.

23. Goodfellow IJ, Pouget-Abadie J, Mirza M, et al. Generative Adversarial Nets. arXiv preprint arXiv:1406.2661. 2014. doi: 10.48550/arXiv.1406.2661.

24. Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. arXiv preprint arXiv:1701.07875. 2017. doi: 10.48550/arXiv.1701.07875.


Review

For citations:


Shakhgeldyan K.I., Kosterin V.V., Rublev V.Yu., Geltser B.I. Comparative analysis of data synthesis methods in the task of predicting atrial fibrillation and in-hospital mortality in patients with coronary heart disease after coronary artery bypass grafting. Medical Doctor and Information Technologies. 2024;(4):28-37. (In Russ.) https://doi.org/10.25881/18110193_2024_4_28

Views: 49


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1811-0193 (Print)
ISSN 2413-5208 (Online)