Optimizing Diabetes Diagnosis: HFM with Tree-Structured Parzen Estimator for Enhanced Predictive Performance and Interpretability

 

Hemalatha Dendukuri1, Kachapuram Basava Raju2, S. Phani Praveen3,*, Janjhyam V. Naga Ramesh4, Vahiduddin Shariff5, N. S. Koti Mani Kumar Tirumanadham6

1Department of CSE, SRKR Engineering College (A), Bhimavaram, A.P, India

2Department of AI, Anurag University, Hyderabad, India

3Department of CSE, PVP Siddhartha Institute of Technology, Vijayawada, A.P, India

4Department of CSE, Graphic Era Hill University, Dehradun, 248002, India

4Department of CSE, Graphic Era Deemed To Be University, Dehradun, 248002, Uttarakhand, India

5,6Department of CSE, Sir C R Reddy College of Engineering, Eluru, A.P, India

Emails: dhl@srkrec.ac.in; kbrajuai@anurag.edu.in; phani.0713@gmail.com; jvnramesh@gmail.com; shariff.v@gmail.com; manikumar1248@gmail.com

 

Abstract

This study proposes the novel machine learning concepts to enhance both prediction accuracy of diabetes detection and interpretation of diagnostic models. First, the methodology uses multiple imputations by chained equations (MICE) to complete data before analysis through missing data imputation procedures. The class imbalance problem is solved through the implementation of Synthetic Minority Over-sampling Technique (SMOTE). The Interquartile Range (IQR) outlier detection method helps remove outliers because it enhances model robustness. The hybrid RFE-WWO selection process combines Recursive Feature Elimination (RFE) with Water Wave optimization (WWO) to select important features that strike the right balance between model complexity and prediction accuracy. The HFM framework contains the Hybrid Fusion Model as its essential component, which merges AdaBoost's and CatBoost's most favorable aspects. The hyperparameter optimization with TPE leads to model tuning which reaches a prediction accuracy of 97.84% through the application of Tree-Structured Parzen Estimator. The entire approach delivers enhanced accuracy and it improves precision along with recall metrics and F1 score performance of the predictive model. The framework shows significant potential for early diagnosis by merging these advanced techniques since ensemble methods are essential for healthcare data analysis while accurate interpretable models are vital to create dependable diagnostic tools.

Keywords: Healthcare; AdaBoost, CatBoost; hyperparameter optimization; Water Wave optimization (WWO) Synthetic Minority Over-sampling Technique (SMOTE); Machine learning (ML)