Predicting Student Outcomes: Evaluating Regression Techniques in

Educational Data

Manish Kumar Singla 1 ∗, Faris H. Rizk2, Mahmoud Elshabrawy Mohamed2, Ahmed Mohamed Zaki2

1Department of Interdisciplinary Courses in Engineering, Chitkara University Institute of Engineering &

Technology, Chitkara University, Punjab, India.

2Computer Science and Intelligent Systems Research Center, Blacksburg 24060, Virginia, USA

Emails: manish.singla@chitkara.edu.in, faris.rizk@jcsis.org, mshabrawy@jcsis.org, Azaki@jcsis.org

Abstract

Student performance prediction is essential so that institutions can assist in identifying weak performers and

initiate corrective measures. This research assesses different regression models by applying data from Kaggle,

which involves data cleaning like managing missing values and scaling of the data, hence feature extraction,

then model imposition and authenticity. The models followed are Linear Regression, SVR, MLPRegressor,

Gradient Boosting, Catboost, Xgboost, Random Forest, Extratrees, Decision Tree and K-neighbors. The analysis

shows that Linear Regression produced the best result as it has the lowest MSE score of 0. 000521 and

high accuracy regarding other measures, including RMSE, MAE, and R². The results reveal that regression

models can be used to predict students’ performance and be helpful to the various stakeholders in the system.

The findings of this study will help develop required models for decision-making to improve students’

performance.

Keywords: Student performance prediction, regression models, educational data, data preprocessing, predictive

analytics