Optimizing Diabetes Diagnosis: HFM with Tree-Structured Parzen Estimator for Enhanced Predictive Performance and Interpretability
Hemalatha Dendukuri1, Kachapuram Basava Raju2, S. Phani Praveen3,*, Janjhyam V. Naga Ramesh4, Vahiduddin Shariff5, N. S. Koti Mani Kumar Tirumanadham6
1Department of CSE, SRKR Engineering College (A), Bhimavaram, A.P, India
2Department of AI, Anurag University, Hyderabad, India
3Department of CSE, PVP Siddhartha Institute of Technology, Vijayawada, A.P, India
4Department of CSE, Graphic Era Hill University, Dehradun, 248002, India
4Department of CSE, Graphic Era Deemed To Be University, Dehradun, 248002, Uttarakhand, India
5,6Department of CSE, Sir C R Reddy College of Engineering, Eluru, A.P, India
Emails: dhl@srkrec.ac.in; kbrajuai@anurag.edu.in; phani.0713@gmail.com; jvnramesh@gmail.com; shariff.v@gmail.com; manikumar1248@gmail.com
Abstract
This study proposes the novel machine learning concepts to enhance both prediction accuracy of diabetes detection and interpretation of diagnostic models. First, the methodology uses multiple imputations by chained equations (MICE) to complete data before analysis through missing data imputation procedures. The class imbalance problem is solved through the implementation of Synthetic Minority Over-sampling Technique (SMOTE). The Interquartile Range (IQR) outlier detection method helps remove outliers because it enhances model robustness. The hybrid RFE-WWO selection process combines Recursive Feature Elimination (RFE) with Water Wave optimization (WWO) to select important features that strike the right balance between model complexity and prediction accuracy. The HFM framework contains the Hybrid Fusion Model as its essential component, which merges AdaBoost's and CatBoost's most favorable aspects. The hyperparameter optimization with TPE leads to model tuning which reaches a prediction accuracy of 97.84% through the application of Tree-Structured Parzen Estimator. The entire approach delivers enhanced accuracy and it improves precision along with recall metrics and F1 score performance of the predictive model. The framework shows significant potential for early diagnosis by merging these advanced techniques since ensemble methods are essential for healthcare data analysis while accurate interpretable models are vital to create dependable diagnostic tools.
Keywords: Healthcare; AdaBoost, CatBoost; hyperparameter optimization; Water Wave optimization (WWO) Synthetic Minority Over-sampling Technique (SMOTE); Machine learning (ML)