International Journal of Artificial Intelligence and Education Technology

Submit Your Paper

2835-2432ISSN (Online)

Volume 4 , Issue 1 , PP: 23-35, 2025 | Cite this article as | XML | Html | PDF | Full Length Article

Analytic Trait Scoring of English Language Learner Essays Using Fine-Tuned DeBERTa-v3 on the ELLIPSE Corpus

Andino Maseleno 1 * , Meinhaj Hussain 2

  • 1 Institut Bakti Nusantara, Lampung, Indonesia - (andino.maseleno@ibnus.ac.id)
  • 2 Rennier University, Ireland - (meinhaj@rennier.online)
  • Doi: https://doi.org/10.54216/IJAIET.040103

    Abstract

    Automated Essay Scoring (AES) technologies have been extensively researched for holistic, topic-specific scoring, but their use to predict multiple analytic writing quality traits of English Language Learner (ELL) student essays has received less attention. This research contributes to this knowledge gap by systematically investigating multi-trait AES on the ELLIPSE corpus (Learning Agency Lab, 2022), a publicly accessible dataset of 6,482 argumentative essays written by grades 8-12 ELLs and rated by human raters on six analytic traits: cohesion, syntax, vocabulary, phraseology, grammar, and conventions. We experiment with five models: Ridge regression, Support Vector Regression (SVR) with a radial basis function (RBF) kernel, Random For-est, fine-tuned BERT-base-uncased and fine-tuned DeBERTa-v3-base. The mean Quadratic Weighted Kappa (QWK) across six traits is highest for DeBERTa-v3 (0.726) - a 26.5-point improvementover the Ridge base-line (0.461) and a 6-point improvement over BERT (0.666). Phraseology is the most difficult trait to score automatically (DeBERTa QWK = 0.701) and cohesion the easiest (DeBERTa QWK = 0.742). Analysis of inter-trait correlations reveals high co-variation between vocabulary and phraseology (r = 0.79), which may reflect common linguistic skills that can be leveraged by multi-task learning. Thisresearch sets a replicable baseline for multi-trait AES on the ELLIPSE corpus, and suggests that phraseology scoring is the most urgent area for future architectural innovation.

    Keywords :

    Automated essay scoring , English language learners , Multi-trait assessment , DeBERTa , Finetuning , ELLIPSE corpus , Quadratic weighted kappa , Analytic writing rubric

    References

    Alnasyan, B., Basheri, M., and Alassafi, M. O. (2024). The power of deep learning techniques forpredicting student performance in virtual learning environments: A systematic literature review. Computers and Education: Artificial Intelligence, 6:100231.

     

    Baker, R. S. and Hawn, A. (2022). Algorithmic bias in education. International Journal of Artificial Intelligence in Education, 32(4):1052–1092.

     

    Borna, M.-R., Saadat, H., Hojjati, A. T., and Akbari, E. (2024). Analyzing click data with AI: Implications for student performance prediction and learning assessment. Frontiers in Education, 9:1421479.

     

    Cho, M., Huang, J.-X., and Kwon, O.-W. (2024). Dual-scale BERT using multi-trait representations for holisticand trait-specific essay grading. ETRI Journal, 46(1):82–95.

     

    Gonz´alez-Nucamendi, A., Noguez, J., Neri, L., Robledo-Rella, V., and Garc´ıa-Castel´an, R. M. G. (2023). Predictive analytics study to determine undergraduate students at risk of dropout. Frontiers in Education, 8:1244686.

     

    Hlosta, M., Herodotou, C., Papathoma, T., Gillespie, A., and Bergamin, P. (2022). Predictive learning analytics in online education: A deeper understanding through explaining algorithmic errors. Computers and Education: Artificial Intelligence, 3:100108.

     

    Kumar, R., Mathias, S., Saha, S., and Bhattacharyya, P. (2022). Many hands make light work: Using essay traits to automatically score essays. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1485–1495, Seattle, WA, USA. Association for Computational Linguistics.

     

    Mayfield, E. and Black, A. W. (2020). Should you fine-tune BERT for automated essay scoring? In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 151–162, Seattle, WA, USA. Association for Computational Linguistics.

     

    Ramesh, D. and Sanampudi, S. K. (2022). An automated essay scoring systems: a systematic literature review. Artificial Intelligence Review, 55(3):2495–2527.

     

    Ridley, R., He, L., Dai, X., Huang, S., and Chen, J. (2021). Automated cross-prompt scoring of essay traits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 13745–13753.

     

    Song, W., Zhang, K., Fu, R., Liu, L., Liu, T., and Cheng, M. (2020). Multi-stage pre-training for automated Chinese essay scoring. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), pages 6723–6733, Online. Association for Computational Linguistics.

     

    Uto, M. (2021). A review of deep-neural automated essay scoring models. Behaviormetrika, 48(2):459–484.

     

    Wang, Y., Wang, C., Li, R., and Lin, H. (2022). On the use of BERT for automated essay scoring: Joint learning of multi-scale essay representation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3416– 3425, Seattle, WA, USA. Association for Computational Linguistics.

     

    Xie, J., Cai, K., Kong, L., Zhou, J., and Qu, W. (2022). Automated essay scoring via pairwise contrastive regression. In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022), pages 2724–2733, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.

     

    Yang, R., Cao, J., Wen, Z., Wu, Y., and He, X. (2020). Enhancing automated essay scoring performance via fine-tuning pre-trained language models with combination of regression and ranking. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1560–1569, Online. Association for Computational Linguistics.

    Cite This Article As :
    Maseleno, Andino. , Hussain, Meinhaj. Analytic Trait Scoring of English Language Learner Essays Using Fine-Tuned DeBERTa-v3 on the ELLIPSE Corpus. International Journal of Artificial Intelligence and Education Technology, vol. , no. , 2025, pp. 23-35. DOI: https://doi.org/10.54216/IJAIET.040103
    Maseleno, A. Hussain, M. (2025). Analytic Trait Scoring of English Language Learner Essays Using Fine-Tuned DeBERTa-v3 on the ELLIPSE Corpus. International Journal of Artificial Intelligence and Education Technology, (), 23-35. DOI: https://doi.org/10.54216/IJAIET.040103
    Maseleno, Andino. Hussain, Meinhaj. Analytic Trait Scoring of English Language Learner Essays Using Fine-Tuned DeBERTa-v3 on the ELLIPSE Corpus. International Journal of Artificial Intelligence and Education Technology , no. (2025): 23-35. DOI: https://doi.org/10.54216/IJAIET.040103
    Maseleno, A. , Hussain, M. (2025) . Analytic Trait Scoring of English Language Learner Essays Using Fine-Tuned DeBERTa-v3 on the ELLIPSE Corpus. International Journal of Artificial Intelligence and Education Technology , () , 23-35 . DOI: https://doi.org/10.54216/IJAIET.040103
    Maseleno A. , Hussain M. [2025]. Analytic Trait Scoring of English Language Learner Essays Using Fine-Tuned DeBERTa-v3 on the ELLIPSE Corpus. International Journal of Artificial Intelligence and Education Technology. (): 23-35. DOI: https://doi.org/10.54216/IJAIET.040103
    Maseleno, A. Hussain, M. "Analytic Trait Scoring of English Language Learner Essays Using Fine-Tuned DeBERTa-v3 on the ELLIPSE Corpus," International Journal of Artificial Intelligence and Education Technology, vol. , no. , pp. 23-35, 2025. DOI: https://doi.org/10.54216/IJAIET.040103