Volume 4 , Issue 1 , PP: 50–66, 2025 | Cite this article as | XML | Html | PDF | Full Length Article
Emad Bashkail 1 * , Nesrin Merhi 2
The detection of students who will face academic difficulties or leave their studies during their initial course period provides universities with a brief time frame to develop effective solutions. This research paper conducts a systematic analysis which tests multiple machine learning classifiers on the Open University Learning Analytics Dataset (OULAD) which serves as one of the most widely used public educational datasets that presents data from 32593 students who studied 22 different courses through distance learning. The four classification methods include logistic regression decision tree random forest and gradient boosting which use a feature set that combines student demographic information and virtual learning environment (VLE) clickstream-based engagement data. The primary discovery shows that VLE behavioral characteristics constitute the most important elements for Random Forest which identifies total click volume and active VLE days and typical daily click volume as its top four elements which make up 92.8% of total importance while demographic information has less impact. Random Forest achieves the strongest held-out test performance (AUC = 0.998, F1 = 0.978, accuracy = 98.2%) while Decision Tree shows lower results with AUC = 0.959 which demonstrates how performance losses occur when systems need to be understandable. At-risk students in the two groups present a 75.8% decrease in total VLEclicks which results in an average of 49.0 clicks compared to 203.0 clicks with a t value of 104.0 and a p value less than 0.001. The research describes its complete end-to-end prediction pipeline which includes details about its model evaluation framework and its dataset to enable future researchers to reproduce the study. The results have direct implications for the design of early-alert systems and the ethical deployment of predictive analytics in higher education.
Learning analytics , Virtual learning environment , At-risk prediction , Random forest , OULAD , Educational data mining , Student engagement , Early warning system
Alnasyan, B., Basheri, M., & Alassafi, M. (2024). The power of deep learning techniques for predicting student performance in virtual learning environments: A systematic literature review. Computers and Education: Artificial Intelligence, 6, 100231. doi:
Althibyani, H. (2024). Predicting student success in MOOCs: a comprehensive analysis using machine learning models. PeerJ Computer Science, 10, e2221. doi: 10.7717/ peerj-cs.2221
Bond, M., Khosravi, H., De Laat, M., Bergdahl, N., Negrea, V., Oxley, E., . . . Knight, S. (2024). Artificial intelligence and the future of teaching and learning in higher education: A systematic review of the literature. International Journal of Educational Technology in Higher Education, 21(1), 6. doi: 10.1186/s41239-023-00436-z
Borna, M.-R., Saadat, H., Hojjati, A. T., & Akbari, E. (2024). Analyzing click data with AI: implications for student performance prediction and learning assessment. Frontiers in Education, 9, 1421479. doi: 10.3389/feduc.2024.1421479
Gonz´alez-Nucamendi, A., Noguez, J., Neri, L., Robledo-Rella, V., & Garc´ıa-Castel´an, R. M. G. (2023). Predictive analytics study to determine undergraduate students at risk of dropout. Frontiers in Education, 8, 1244686. doi: 10.3389/feduc.2023.1244686
Hlosta, M., Herodotou, C., Papathoma, T., Gillespie, A., & Bergamin, P. (2022). Predictive learning analytics in online education: A deeper understanding through explaining algorithmic errors. Computers and Education: Artificial Intelligence, 3, 100108. doi: 10.1016/j.caeai.2022.100108
Jin, L., Wang, Y., Song, H., & So, H.-J. (2024). Predictive modelling with the openuniversity learning analytics dataset (OULAD): A systematic literature review. In Artificial intelligence in education. posters and late breaking results, workshops and tutorials, industry and innovation tracks, practitioners, doctoral consortium and blue sky (AIED 2024) (Vol. 2150, pp. 477–484). Cham, Switzerland: Springer. doi: 10.1007/978-3-031-64315-6 46
Kuzilek, J., Hlosta, M., & Zdrahal, Z. (2017). Open university learning analytics dataset. Scientific Data, 4(1), 170171. doi: 10.1038/sdata.2017.171
Mahafdah, R., Bouallegue, S., & Bouallegue, R. (2024). Enhancing e-learning through AI: advanced techniques for optimizing student performance. PeerJ Computer Science, 10, e2576. doi: 10.7717/peerj-cs.2576