Neutrosophic Sets in Big Data Analytics: A Novel Approach for Feature Selection and Classification

Azmi Shawkat Abdulbaqi¹, Ahmed Dheyaa Radhi² , Lateef Abd Zaid Qudr³, Harshavardhan Reddy Penubadi^4,5, Ravi Sekhar^{4, *}, Pritesh Shah⁴, Mrinal Bachute⁴ , Jamal Fadhil Tawfeq⁶ , Hassan muwafaq Gheni⁷

¹University of Anbar, Renewable Energy Research Center, Ramadi, Iraq

²College of Pharmacy, University of Al-Ameed, Karbala PO Box 198, Iraq

³Department of Computer, Techniques Engineering, AlSafwa University College, Almamalje str.,

56001, Karbala, Iraq

⁴Symbiosis Institute of Technology, Pune Campus, Symbiosis International (Deemed University) (SIU), Pune 412115, Maharashtra, India

⁵Myriad Genetics, Salt Lake City, UT, USA

⁶Department of Medical Instrumentation Technical Engineering, Medical Technical College, Al-Farahidi University, Baghdad 00965, Iraq

⁷Computer Techniques Engineering Department, Al-Mustaqbal University College, Hillah 51001, Iraq

Emails:azmi_msc@uoanbar.edu.iq; ahmosawi@alameed.edu.iq;latifkhder@alsafwa.edu.iq; harshavdevops99@gmail.com; ravi.sekhar@sitpune.edu.in; pritesh.shah@sitpune.edu.in; mrinal.bachute@sitpune.edu.in; jamaltawfeq55@gmail.com; hasan.muwafaq@mustaqbal-college.edu.iq

Abstract

Big Data Analytics are said to help in transforming huge amounts of raw data towards valuable information that can be used, but there are formidable challenges in feature selection and classification due to the complexity and high dimensionality of the data. Traditional methods are usually too weak to handle the built-in uncertainty, imprecision, and inconsistency within big data and they often fail to perform well. This paper aims to induce the new methodology on these problems using the sets of neutrosophic in dealing with more flexible and nuanced data analysis. The key contributions to the current approach proposed are threefold. First, generalization of the classical set through extension of the notions of truth, indeterminacy, and falsity by allowing representations of uncertainty in data. The second combines a powerful process for selecting features based upon neutrosophic set theory that is optimal by genetic algorithms and advances a step further by applying these features in training and validating the classification models across a set of different domains. Therefore, the major aim from this study is to increase accuracy and reliability in feature selection and classification in big data analytics. This methodology has been implemented and tested over datasets of the following types: healthcare, finance, social media, and more. Results have proved great improvement against conventional performance metrics, for example, the classification accuracy with an SVM classifier over the Cleveland Heart Disease dataset increases from 83.5% to 87.2%, and of a Random Forest classifier over a financial dataset from 76.4% to 81.9%. For instance, the accuracy of social media sentiment analysis changed to 82.7% from 78.3%. All these findings establish that the neutrosophic set-based method holds good advantages in addressing the limitations of classical alternatives. The proposed approach of neutrosophism, through an explicit model, enhances performances in classifications and, at the same time, augments overall robustness and reliability in big data analytic. The importance of this study lies in establishing the groundwork for further research and practical applications, thus indicating possible further development in this field.

Keywords: Neutrosophic Sets; Big Data Analytics; Feature Selection; Classification; Uncertainty Modeling; Indeterminacy; Genetic Algorithms; Support Vector Machine; Random Forest