A Comparative Analysis of Methods for Detecting and Diagnosing Breast Cancer Based on Data Mining
Ahmed T. Alhasani 1, Hussein Alkattan *2, Alhumaima Ali Subhi *3, El-Sayed M. El-Kenawy 4, Marwa M. Eid 4
1 Al-Furat Al-Awsat Technical University Computer Center Administrator, Najaf, Iraq
2 Department of System Programming, South Ural State University, Chelyabinsk 454080, Russia
3 Electronic Computer Center University of Diyala, Diyala, Iraq
4 Faculty of Artificial Intelligence, Delta University for Science and Technology, Mansoura, Egypt
Emails: ahmed.alhasani@atu.edu.iq; alkattan.hussein92@gmail.com; alhumaimaali@uodiyala.edu.iq; skenawy@ieee.org; mmm@ieee.org
Abstract
Breast cancer is a significant public health concern worldwide, and early detection is crucial for its treatment. Although breast cancer has been extensively studied, there is still room for improvement in its classification accuracy. This study aims to improve the classification accuracy of breast cancer by applying information gain feature selection and machine learning techniques to the Wisconsin Diagnostic Breast Cancer (WDBC) dataset. The information gain method is utilized to reduce feature characteristics, and machine learning algorithms such as support vector machine (SVM), naive Bayes (NB), and C4.5 decision tree are employed for breast cancer classification. The study also conducts a comparison analysis based on accuracy value. The proposed model achieves maximum classification accuracy (100%) and a weighted average for precision (100%) and recall (100%) using a C4.5 decision tree, while SVM accuracy (98.42%) and weighted average for precision (98.17%) and recall (98.58%) are achieved using a C4.5 decision tree. The NB algorithm attains an accuracy of 96%, with a weighted average for precision (18.57%) and recall (50%). The proposed model's results are compared to similar studies and demonstrate significant progress, indicating new opportunities for breast cancer detection.
Keywords: Information Gain Feature Selection; Machine learning; classifier support vector machine; classifier naïve Bayes; classifier C4.5 decision tree; Performance evaluation tests