Predicting Academic Outcomes in Secondary Education: Ensemble
Classification with Grade Trajectories, Attendance Behaviour, and
Socioeconomic Context
Jehad Mousa1,2,∗, Abdallah Salama3
1University of Dubai, UAE
2United Arab Emirates University, UAE
3Assistant Professor in Sociology, City University Ajman, Ajman, UAE
Emails: Jehadgmousa@gmail.com; a.adel@cu.ac.ae
Abstract
Early identification of students at risk of academic failure is a persistent challenge in educational technology, with direct
implications for student retention, institutional equity, and the allocation of support resources. Although supervised
machine learning has been widely applied to student outcome prediction, the relative merit of competing algorithm
classes and the degree to which demographic and behavioural features contribute predictive power beyond prior academic
assessments remain incompletely resolved in the secondary school context. This paper presents a structured comparative
evaluation of five supervised classifiers trained on a rich combination of periodic grades, attendance records, sociodemographic
characteristics, and lifestyle indicators drawn from secondary school students. A dual importance analysis—
combining impurity-based measures with held-out permutation importance—disentangles the distinct predictive roles of
grade trajectories, absenteeism, parental background, and lifestyle variables. Ensemble methods demonstrate consistent
superiority across all evaluation criteria, with prior periodic assessments and attendance emerging as the dominant
predictors. Parental education level introduces a socioeconomic gradient that operates independently of student-controlled
factors, generating structural inequities that standard grade-monitoring systems are unlikely to address. These findings
provide both a methodological benchmark for secondary school prediction tasks and practical guidance for institutions
designing equitable and evidence-based early warning interventions.
Keywords: Educational data mining; Machine learning; Student outcome prediction; Ensemble methods; Learning
analytics; Secondary education; Early warning systems