495 439
Full Length Article
Fusion: Practice and Applications
Volume 13 , Issue 1, PP: 08-18 , 2023 | Cite this article as | XML | Html |PDF

Title

Enhancing Heart Disease Diagnosis Using Machine Learning Classifiers

  Ahmed A. H. Alkurdi 1 *

1  Department of Information Technology Management, Technical College of Administration, Duhok Polytechnic University, Duhok, KRG-Iraq; Department of Computer Science, College of Science, Nawroz University, Duhok, KRG-Iraq
    (Ahmed.alaa@dpu.edu.krd)


Doi   :   https://doi.org/10.54216/FPA.130101

Received: March 02, 2023 Revised: June 01, 2023 Accepted: August 04, 2023

Abstract :

Heart diseases are the primary cause of death worldwide. The approximate mortality rate due to cardiovascular diseases is a staggering 18 million lives per year. many human lives could be saved with early and accurate diagnosis and prediction of such conditions. Thus, the automation of such a process is crucial and achievable with the rise of machine learning and deep learning capabilities. However, patient data is riddled with issues which must be resolved before they can be used for heart disease prediction. This research aims to improve the accuracy of heart disease diagnosis by utilizing data preprocessing techniques and classification algorithms. These techniques may provide an insight into predicting cardiovascular diseases from subtle clues before any major symptoms arise. The study employs the Heart Disease UCI dataset and follows a systematic approach to train machine learning models in the process of heart disease diagnosis. The approach utilizes a variety of data preprocessing techniques to prepare the data for model training such as MEAN missing value imputation, Normalization, Synthetic Minority Over-sampling Technique (SMOTE), and Correlation. Afterward, the preprocessed data is fed into four popular classification algorithms: Decision Tree, Random Forest, Support Vector Machine (SVM), and k-Nearest Neighbors (k-NN). These algorithms provide a broad evaluation of the dataset. The proposed methodology demonstrates promising results which clearly highlight the value and significance of data preprocessing. This is evident from the achieved accuracy, precision, recall, F1 score and ROC AUC results. In summary, the importance of preprocessing and feature selection is distinct when dealing with datasets containing various challenges. These crucial processes play a central role in building a trustworthy and precise model for heart disease prediction.

Keywords :

Machine Learning; Classification; Preprocessing; Feature Selection; Heart Disease.

References :

 

[1]    A. Esteva et al., “A guide to deep learning in healthcare,” Nat Med, vol. 25, no. 1, pp. 24–29, Jan. 2019, doi: 10.1038/s41591-018-0316-z.

[2]    M.A. Mohammed, A. Lakhan, D. A. Zebari, K. H. Abdulkareem, J. Nedoma, R. Martinek, ... & P. Tiwari, (2023). Adaptive secure malware efficient machine learning algorithm for healthcare data. CAAI Transactions on Intelligence Technology.

[3]    B. A. Goldstein, A. M. Navar, and M. J. Pencina, “Risk Prediction With Electronic Health Records,” JAMA Cardiol, vol. 1, no. 9, p. 976, Dec. 2016, doi: 10.1001/jamacardio.2016.3826.

[4]    H. Kang, “The prevention and handling of the missing data,” Korean J Anesthesiol, vol. 64, no. 5, p. 402, 2013, doi: 10.4097/kjae.2013.64.5.402.

[5]    T. Hastie, J. Friedman, and R. Tibshirani, The Elements of Statistical Learning. New York, NY: Springer New York, 2001. doi: 10.1007/978-0-387-21606-5.

[6]    L. Breiman, “Random Forests,” Mach Learn, vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.

[7]    C. Cortes and V. Vapnik, “Support-vector networks,” Mach Learn, vol. 20, no. 3, pp. 273–297, Sep. 1995, doi: 10.1007/BF00994018.

[8]    T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE Trans Inf Theory, vol. 13, no. 1, pp. 21–27, Jan. 1967, doi: 10.1109/TIT.1967.1053964.

[9]    Z. Arabasadi, R. Alizadehsani, M. Roshanzamir, H. Moosaei, and A. A. Yarifard, “Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm,” Comput Methods Programs Biomed, vol. 141, pp. 19–26, Apr. 2017, doi: 10.1016/j.cmpb.2017.01.004.

[10]  H. Mansoor, S. Ali, S. Alam, M. A. Khan, U. Ul Hassan, and I. Khan, “Impact Of Missing Data Imputation On The Fairness And Accuracy Of Graph Node Classifiers,” in 2022 IEEE International Conference on Big Data (Big Data), IEEE, Dec. 2022, pp. 5988–5997. doi: 10.1109/BigData55660.2022.10020694.

[11]  M. S. Pathan, A. Nag, M. M. Pathan, and S. Dev, “Analyzing the impact of feature selection on the accuracy of heart disease prediction,” Healthcare Analytics, vol. 2, p. 100060, Nov. 2022, doi: 10.1016/j.health.2022.100060.

[12]  F. H. Alfebi and M. D. Anasanti, “Improving Cardiovascular Disease Prediction by Integrating Imputation, Imbalance Resampling, and Feature Selection Techniques into Machine Learning Model,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 17, no. 1, p. 55, Feb. 2023, doi: 10.22146/ijccs.80214.

[13]  A. M. Sowjanya and O. Mrudula, “Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms,” Appl Nanosci, vol. 13, no. 3, pp. 1829–1840, Mar. 2023, doi: 10.1007/s13204-021-02063-4.

[14]  V. Sheth, U. Tripathi, and A. Sharma, “A Comparative Analysis of Machine Learning Algorithms for Classification Purpose,” Procedia Comput Sci, vol. 215, pp. 422–431, 2022, doi: 10.1016/j.procs.2022.12.044.

[15]  R. Li et al., “Cardiovascular Disease Risk Prediction Based on Random Forest,” 2019, pp. 31–43. doi: 10.1007/978-981-13-6837-0_3.

[16]  S. García, J. Luengo, and F. Herrera, Data Preprocessing in Data Mining, vol. 72. Cham: Springer International Publishing, 2015. doi: 10.1007/978-3-319-10247-4.

[17]  J. Brownlee, Optimization for machine learning. Machine Learning Mastery. 2021.

[18]  N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.

[19]  R. J. A. Little and D. B. Rubin, “Missing Data in Experiments,” 2014, pp. 24–40. doi: 10.1002/9781119013563.ch2.

[20]  C. M. Bishop, Pattern Recognition and Machine Learning, 1st ed. New York: Springer New York, NY, 2006.

[21]  A. Hasan Bdair Aighuraibawi et al., “Feature Selection for Detecting ICMPv6-Based DDoS Attacks Using Binary Flower Pollination Algorithm,” Computer Systems Science and Engineering, vol. 47, no. 1, pp. 553–574, 2023, doi: 10.32604/csse.2023.037948.

[22]  Hanchuan Peng, Fuhui Long, and C. Ding, “Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Trans Pattern Anal Mach Intell, vol. 27, no. 8, pp. 1226–1238, Aug. 2005, doi: 10.1109/TPAMI.2005.159.

[23]  A. Jain and D. Zongker, “Feature selection: evaluation, application, and small sample performance,” IEEE Trans Pattern Anal Mach Intell, vol. 19, no. 2, pp. 153–158, 1997, doi: 10.1109/34.574797.

[24]  S. L. Salzberg, “C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993,” Mach Learn, vol. 16, no. 3, pp. 235–240, Sep. 1994, doi: 10.1007/BF00993309.

[25]  H. Rashid Abdulqadir, A. Mohsin Abdulazeez, and D. Assad Zebari, “Data Mining Classification Techniques for Diabetes Prediction,” Qubahan Academic Journal, vol. 1, no. 2, pp. 125–133, May 2021, doi: 10.48161/qaj.v1n2a55.

[26]  Fatma M. Talaat, An Enhanced Deep Learning Technique to Measure the Impact of Cryptocurrency on the World Payment system using Random Forest, American Journal of Business and Operations Research, Vol. 8 , No. 2 , (2022) : 08-15 (Doi   :  https://doi.org/10.54216/AJBOR.080201)

[27]  K. I. Taher, A. M. Abdulazeez, and D. A. Zebari, “Data Mining Classification Algorithms for Analyzing Soil Data,” Asian Journal of Research in Computer Science, pp. 17–28, May 2021, doi: 10.9734/ajrcos/2021/v8i230196.

[28]  Ajay G, Abhishek Kumar, Venkatesan R, Query-Based Image Retrieval using Support Vector Machine (SVM), Journal of Cognitive Human-Computer Interaction, Vol. 1 , No. 1 , (2021) : 28-36 (Doi   :  https://doi.org/10.54216/JCHCI.010104)

[29]  Mahmoud A. Salam, Intelligent system for IoT botnet detection using SVM and PSO optimization, Journal of Intelligent Systems and Internet of Things, Vol. 3 , No. 2 , (2021) : 68-84 (Doi   :  https://doi.org/10.54216/JISIoT.030203)

[30]  Rukhsar, S., Awan, M. J., Naseem, U., Zebari, D. A., Mohammed, M. A., Albahar, M. A., ... & Mahmoud, A. (2023). Artificial Intelligence Based Sentence Level Sentiment Analysis of COVID-19. Computer Systems Science & Engineering, 47(1).

[31]  Mona Mohamed, Intelligent Fat Predictor: Leveraging Linear Regression and K Nearest Neighbors in Obesity diseases., International Journal of Advances in Applied Computational Intelligence, Vol. 3 , No. 1 , (2023) : 08-18 (Doi   :  https://doi.org/10.54216/IJAACI.030101)

[32]  M. M. Ali, B. K. Paul, K. Ahmed, F. M. Bui, J. M. W. Quinn, and M. A. Moni, “Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison,” Comput Biol Med, vol. 136, p. 104672, Sep. 2021, doi: 10.1016/j.compbiomed.2021.104672.

[33]  X.-Y. Gao, A. Amin Ali, H. Shaban Hassan, and E. M. Anwar, “Improving the Accuracy for Analyzing Heart Diseases Prediction Based on the Ensemble Method,” Complexity, vol. 2021, pp. 1–10, Feb. 2021, doi: 10.1155/2021/6663455.

[34]  S. P. Patro, G. S. Nayak, and N. Padhy, “Heart disease prediction by using novel optimization algorithm: A supervised learning prospective,” Inform Med Unlocked, vol. 26, p. 100696, 2021, doi: 10.1016/j.imu.2021.100696.

[35]  D. Zhang et al., “Heart Disease Prediction Based on the Embedded Feature Selection Method and Deep Neural Network,” J Healthc Eng, vol. 2021, pp. 1–9, Sep. 2021, doi: 10.1155/2021/6260022.

[36]  I. D. Mienye, Y. Sun, and Z. Wang, “Improved sparse autoencoder based artificial neural network approach for prediction of heart disease,” Inform Med Unlocked, vol. 18, p. 100307, 2020, doi: 10.1016/j.imu.2020.100307.

[37]   Rajinikanth, V., Yassine, S., & Bukhari, S. A. (2024). Hand-Sketchs based Parkinson’s disease Screening using Lightweight Deep-Learning with Two-Fold Training and Fused Optimal Features . International Journal of Mathematics, Statistics, and Computer Science, 2, 9–18. https://doi.org/10.59543/ijmscs.v2i.7821

[38]        Arif, Z. H., & Cengiz, K. (2023). Severity Classification for COVID-19 Infections based on Lasso-Logistic Regression Model. International Journal of Mathematics, Statistics, and Computer Science, 1, 25–32. https://doi.org/10.59543/ijmscs.v1i.7715


Cite this Article as :
Style #
MLA Ahmed A. H. Alkurdi. "Enhancing Heart Disease Diagnosis Using Machine Learning Classifiers." Fusion: Practice and Applications, Vol. 13, No. 1, 2023 ,PP. 08-18 (Doi   :  https://doi.org/10.54216/FPA.130101)
APA Ahmed A. H. Alkurdi. (2023). Enhancing Heart Disease Diagnosis Using Machine Learning Classifiers. Journal of Fusion: Practice and Applications, 13 ( 1 ), 08-18 (Doi   :  https://doi.org/10.54216/FPA.130101)
Chicago Ahmed A. H. Alkurdi. "Enhancing Heart Disease Diagnosis Using Machine Learning Classifiers." Journal of Fusion: Practice and Applications, 13 no. 1 (2023): 08-18 (Doi   :  https://doi.org/10.54216/FPA.130101)
Harvard Ahmed A. H. Alkurdi. (2023). Enhancing Heart Disease Diagnosis Using Machine Learning Classifiers. Journal of Fusion: Practice and Applications, 13 ( 1 ), 08-18 (Doi   :  https://doi.org/10.54216/FPA.130101)
Vancouver Ahmed A. H. Alkurdi. Enhancing Heart Disease Diagnosis Using Machine Learning Classifiers. Journal of Fusion: Practice and Applications, (2023); 13 ( 1 ): 08-18 (Doi   :  https://doi.org/10.54216/FPA.130101)
IEEE Ahmed A. H. Alkurdi, Enhancing Heart Disease Diagnosis Using Machine Learning Classifiers, Journal of Fusion: Practice and Applications, Vol. 13 , No. 1 , (2023) : 08-18 (Doi   :  https://doi.org/10.54216/FPA.130101)