Optimizing Random Forest for Handwritten Digit Recognition Through Hyper-parameter Tuning

 

Yaqeen Saad Ali1, Rihab Hazim Qasim1 , Sura Mahroos Searan1, Othman Mohammed Jasim2,*,
Ibaa Sadoon Jabbar Alzubaydı3

1Department of computer Science, College of computer Science and Information Technology , University of Anbar, Anbar, Iraq

2Department of Computer Engineering Techniques, College of Technical Engineering, University of Al Maarif, Al Anbar, 31001, Iraq

3Construction and Projects Department, University of Technology, Baghdad, Iraq

Emails:  yaqeen.cs91@uoanbar.edu.iq; rehz1991@uoanbar.edu.iq; surasms917@uoanbar.edu.iq; othman.jaseem@uoanbar.edu.iq; Ibaa.s.Jabbar@uotechnology.edu.iq

 

Abstract

The significant increase in the volume of recently released records and multimedia news that is available presents fresh issues for pattern-recognition and machine-learning, particularly in addressing the longstanding issue of recognizing handwritten digits. Handwriting-recognition is a captivating area of research due to the uniqueness of each individual's handwriting style. It involves a computer's ability that automatically identify and comprehend handwritten (digit or character). Hyper parameters play a crucial role in the performance of machine learning algorithms, directly influencing the training process and significantly affecting the resulting model's performance. This work introduce a general automated hyper parameter tuning mechanics were used to optimize the random forest parameters, which are: grid- random search and Bayesian optimization applying on MNIST digit database (images) that have already been pre-processed. These proposed methods successfully identify optimal hyper parameters across a wide variety of ML models, taking into consideration the time cost of the search. This work shows the effectiveness and efficiency of used techniques, crucial for real-world applications. The results of this study show an accuracy rate of 99.3% for the Grid Search model, 98.8% for the Random Search model, and 96.0% for Bayesian Optimization on random forest algorithm.

 

Keywords: Handwritten-Recognition; Mnistdataset; Random-forest algorithm; Grid search; Random search; Bayesian Optimization