Volume 4 , Issue 1 , PP: 08-22, 2025 | Cite this article as | XML | Html | PDF | Full Length Article
Aa Hubur 1 * , Aygul Z. Ibatova 2
Student retention in higher education institutions is a critical problem that causes academic and financial challenges to individual students and to schools and entire countries. The field of study should be in the area of student retention as it enables educational facilities to provide appropriate intervention. The present study implements a comparative analysis of five machine learning classifiers; Linear Discriminant Analysis, K-Nearest Neighbours, Support Vector Machine, Random Forest and Gradient Boosting classifiers on dataof 4424 students who were selected from the Realinho et al. (2022) data set which contains demographic and socioeconomic, and macroeconomic and academic performance data from a Portuguese higher education institution over a decade. The mutual information feature selection step reduces the 22-dimensional feature space prior to model trainingby selecting 12 features that have, statistically, the highest discriminative power. Five-fold stratified cross-validation shows that the best overall performance is achieved by a SVM with a radial basis function kernel with accuracy of 97.1% and F1 score of 0.954 and all five models achieve AUC greater than 0.981. The importance analysis reveals that the combination of four measures of academic success from the first two semesters constructs 87.6% of the signal that Random Forest model uses for prediction which is driven by the most important predictor - number of curricular units that the student passes during the secondsemester (importance= 0.335). The impact of all socioeconomic and demographic and macroeconomic factors is less than 13%. The findings of the study have three implications about risk factors in student retention via empirical measurement.
Student dropout prediction , Machine learning , Educational data mining , Mutual information feature selection , Higher education analytics , Support vector machine , Random Forest , Early warning systems
Alnasyan, B., Basheri, M., & Alassafi, M. (2024). The power of deep learning techniques for predicting student performance in virtual learning environments: A systematic literature review. Computers and Education: Artificial Intelligence, 6, Article 100231. https://doi.org/10.1016/j.caeai.2024.100231
Althibyani, H. (2024). Predicting student success in MOOCs: A comprehensive analysis using machine learning models. PeerJ Computer Science, 10, Article e2221. https://doi.org/10.7717/peerj-cs.2221
Baker, R. S., & Hawn, A. (2022). Algorithmic bias in education. International Journal of Artificial Intelligence in Education, 32(4), 1052–1092. https://doi.org/10.1007/s40593-021-00285-9
Borna, M.-R., Saadat, H., Hojjati, A. T., & Akbari, E. (2024). Analyzing click data with AI: Implications for student performance prediction and learning assessment. Frontiers in Education, 9, Article 1421479. https://doi.org/10.3389/feduc.2024.1421479
Gonz´alez-Nucamendi, A., Noguez, J., Neri, L., Robledo-Rella, V., & Garc´ıa-Castel´an, R. M. G. (2023). Predictive analytics study to determine undergraduate students at risk of dropout. Frontiers in Education, 8, Article 1244686. https://doi.org/10.3389/feduc.2023.1244686
Guanin-Fajardo, J. H., Gua˜na-Moya, J., & Casillas, J. (2024). Predicting academic success of college students using machine learning techniques. Data, 9(4), Article 60. https://doi.org/10.3390/data9040060
Hlosta, M., Herodotou, C., Papathoma, T., Gillespie, A., & Bergamin, P. (2022). Predictive learning analytics in online education: A deeper understanding through explaining algorithmic errors. Computers and Education: Artificial Intelligence, 3, Article 100108. https://doi.org/10.1016/j.caeai.2022.100108
Jin, L., Wang, Y., Song, H., & So, H.-J. (2024). Predictive modelling with the Open University Learning AnalyticsDataset (OULAD): A systematic literature review. In Artificial intelligence in education. Posters and late breaking results, workshops and tutorials (AIED 2024) (Vol. 2150, pp. 477–484). Springer. https://doi.org/10.1007/978-3-031-64315-6 46
Mahafdah, R., Bouallegue, S., & Bouallegue, R. (2024). Enhancing e-learning through AI: Advanced techniques for optimizing student performance. PeerJ Computer Science, 10, Article e2576. https://doi.org/10.7717/peerj-cs.2576
Realinho, V., Machado, J., Baptista, L., & Martins, M. V. (2022). Predicting student dropout and academic success. Data, 7(11), Article 146. https://doi.org/10.3390/data7110146