Neutrosophic Sets in Big Data Analytics: A Novel Approach for Feature Selection and Classification
Azmi Shawkat Abdulbaqi1 , Ahmed Dheyaa Radhi2 , Lateef Abd Zaid Qudr3, Harshavardhan Reddy Penubadi4,5, Ravi Sekhar4, *, Pritesh Shah4, Mrinal Bachute4 , Jamal Fadhil Tawfeq6 , Hassan muwafaq Gheni7
1University of Anbar, Renewable Energy Research Center, Ramadi, Iraq
2College of Pharmacy, University of Al-Ameed, Karbala PO Box 198, Iraq
3Department of Computer, Techniques Engineering, AlSafwa University College, Almamalje str.,
56001, Karbala, Iraq
4Symbiosis Institute of Technology, Pune Campus, Symbiosis International (Deemed University) (SIU), Pune 412115, Maharashtra, India
5Myriad Genetics, Salt Lake City, UT, USA
6Department of Medical Instrumentation Technical Engineering, Medical Technical College, Al-Farahidi University, Baghdad 00965, Iraq
7Computer Techniques Engineering Department, Al-Mustaqbal University College, Hillah 51001, Iraq
Emails:azmi_msc@uoanbar.edu.iq; ahmosawi@alameed.edu.iq;latifkhder@alsafwa.edu.iq; harshavdevops99@gmail.com; ravi.sekhar@sitpune.edu.in; pritesh.shah@sitpune.edu.in; mrinal.bachute@sitpune.edu.in; jamaltawfeq55@gmail.com; hasan.muwafaq@mustaqbal-college.edu.iq
Abstract
Big Data Analytics are said to help in transforming huge amounts of raw data towards valuable information that can be used, but there are formidable challenges in feature selection and classification due to the complexity and high dimensionality of the data. Traditional methods are usually too weak to handle the built-in uncertainty, imprecision, and inconsistency within big data and they often fail to perform well. This paper aims to induce the new methodology on these problems using the sets of neutrosophic in dealing with more flexible and nuanced data analysis. The key contributions to the current approach proposed are threefold. First, generalization of the classical set through extension of the notions of truth, indeterminacy, and falsity by allowing representations of uncertainty in data. The second combines a powerful process for selecting features based upon neutrosophic set theory that is optimal by genetic algorithms and advances a step further by applying these features in training and validating the classification models across a set of different domains. Therefore, the major aim from this study is to increase accuracy and reliability in feature selection and classification in big data analytics. This methodology has been implemented and tested over datasets of the following types: healthcare, finance, social media, and more. Results have proved great improvement against conventional performance metrics, for example, the classification accuracy with an SVM classifier over the Cleveland Heart Disease dataset increases from 83.5% to 87.2%, and of a Random Forest classifier over a financial dataset from 76.4% to 81.9%. For instance, the accuracy of social media sentiment analysis changed to 82.7% from 78.3%. All these findings establish that the neutrosophic set-based method holds good advantages in addressing the limitations of classical alternatives. The proposed approach of neutrosophism, through an explicit model, enhances performances in classifications and, at the same time, augments overall robustness and reliability in big data analytic. The importance of this study lies in establishing the groundwork for further research and practical applications, thus indicating possible further development in this field.
Keywords: Neutrosophic Sets; Big Data Analytics; Feature Selection; Classification; Uncertainty Modeling; Indeterminacy; Genetic Algorithms; Support Vector Machine; Random Forest