An Ensemble Machine Learning Method for Analyzing Various Medical Datasets

 

Chhaya Gupta 1, Nasib Singh Gill 2, Priti Maheshwary 3, Shraddha V. Pandit 4, Preeti Gulia 5, Piyush Kumar Pareek 6

 

1, 2, 5 Department of Computer Science and Applications, Maharshi Dayanand University, Rohtak, Haryana, India

3 Rabindranath Tagore University, Bhopal, India

4 Department of Artificial Intelligence and Data Science, PES Modern College of Engineering, Shivajinagar, Pune, India

6 Professor and Head Department of AIML and IPR Cell, Nitte Meenakshi Institute of Technology, Bengaluru, India

 

Emails: chhaya.rs.dcsa@mdurohtak.ac.in; Nasib.gill@mdurohtak.ac.in; pritimaheshwary@gmail.com; shraddha.pandit@moderncoe.edu.in; preeti@mdurohtak.ac.in; piyush.kumar@nmit.ac.in

Abstract

In recent years, machine learning (ML) has shown a significant impact in tackling various complicated problems in different application domains, including healthcare, economics, ecological, stock market, surveillance, and commercial applications. Machine Learning techniques are good enough to deal with a wide range of data, uncover fascinating links, offer insights, and spot trends. ML can improve disease diagnosis accuracy, predictability, performance, and reliability. This paper reviews various machine learning techniques applied to different medical datasets and proposes an ensemble method for helping in the early diagnosis of different diseases. The study compares existing machine learning techniques with the proposed ensemble method. The ensemble method uses the AdaBoost algorithm to combine the traits of choice trees, random forests, and support vector machines. Three feature selection techniques, Fisher’s score, information gain, and genetic algorithm, are used to select appropriate dataset features. The ensemble method also uses the K-fold cross-validation technique (where k=15) for validating results. SMOTE was employed to balance some of the datasets because they were quite unbalanced. All the methods used in this study are evaluated based on accuracy, AU Curve, Recall, Precision, and F1-score. The paper uses different medical datasets at the University of California Irvine and the Kaggle directory to compare machine-learning models with the proposed ensemble method. The encouraging results show that the ensemble method outperforms the existing machine-learning techniques. The paper thoroughly analyzes how machine learning is used in the medical industry, covering established technologies and their impact on medical diagnosis. An early diagnosis is needed to prevent people from deadly diseases. Hence, this study proposes an ensemble method that may be used to diagnose different diseases early.

Corresponding Author: Piyush Kumar Pareek, Email:  piyush.kumar@nmit.ac.in,

 

 Received: September 22, 2023 Revised: January 19, 2024 Accepted: June 13, 2024

Keywords: Choice Tree Classifier; Ensemble Classifier; KNN Classifier; Naïve Bayes Classifier; Random Forest Classifier; Synthetic Minority Oversampling Technique