Volume 5 • Issue 1 • PP: 58–72 • 2026
ChatGPT as an Assessment Design Tool in Higher Education: Evaluating Item Quality, Bloom’s Taxonomy Coverage, and Faculty Acceptance Across Academic Disciplines
Abstract
The emergence of large language models capable of generating coherent, contextually grounded text at scale has created a new and contested tool for higher education assessment design: instructors can now produce examination questions, assignment prompts, and feedback rubrics in seconds rather than hours. Whether the items produced by these systems meet the quality standards required for valid, reliable, and pedagogically appropriate higher education assessment is an empirical question that the literature has only partially addressed. This paper reports a three-study investigation of ChatGPT as an assessment design tool in higher education, covering item quality, cognitive level coverage, student performance, and faculty acceptance. Study 1 presents an expert-panel evaluation of 360 assessment items—180 generated by ChatGPT and 180 created by experienced instructors across six academic disciplines and four item types, rated on seven quality dimensions including content accuracy, Bloom’s taxonomy alignment, linguistic clarity, and originality. Study 2 reports a faculty survey of 186 instructors examining adoption rates, perceived benefits, concerns, and the predictors of acceptance. Study 3 compares the performance of 412 students on counterbalanced ChatGPT-generated and instructor-created assessment items. ChatGPT-generated items score significantly below instructor-created items on Bloom’s taxonomy alignment and originality, but perform comparably or above on linguistic clarity and difficulty calibration. Student performance is modestly but significantly higher on ChatGPT-generated items, a finding that challenges simple assumptions about AI-generated assessment difficulty. Academic integrity concerns and higher-order cognitive coverage are the dominant faculty concerns, while time savings—averaging 77% reduction in item-writing time—is the most consistently cited benefit. The paper contributes a validated multi-dimensional item quality framework, a faculty acceptance model, and eight evidence-based guidelines for the responsible integration of ChatGPT in assessment design workflows.
Keywords
References
[1] T. M. Haladyna, S. M. Downing, and M. C. Rodriguez, “A review of multiple-choice item-writing guidelines for classroom assessment,” Applied Measurement in Education, vol. 15, no. 3, pp. 309–334, 2002, doi: 10.1207/S15324818AME1503_5.
[2] E. Kasneci, K. Sessler, S. Kuchemann, M. Bannert, D. Dementieva, F. Fischer, U. Gasser, G. Groh, S. Gunnemann, E. Hullermeier, S. Krusche, G. Kutyniok, T. Michaeli, C. Nerdel, J. Pfeffer, O. Poquet, M. Sailer, A. Schmidt, T. Seidel, and G. Kasneci, “ChatGPT for good? On opportunities and challenges of large language models for education,” Learning and Individual Differences, vol. 103, p. 102274, 2023, doi: 10.1016/j.lindif.2023.102274.
[3] D. R. E. Cotton, P. A. Cotton, and J. R. Shipway, “Chatting and cheating: Ensuring academic integrity in the era of ChatGPT,” Innovations in Education and Teaching International, vol. 61, no. 2, pp. 228–239, 2024, doi: 10.1080/14703297.2023.2190148.
[4] C. K. Lo, “What is the impact of ChatGPT on education? A rapid review of the literature,” Education Sciences, vol. 13, no. 4, p. 410, 2023, doi: 10.3390/educsci13040410.
[5] Y. K. Dwivedi, N. Kshetri, L. Hughes, E. L. Slade, A. Jeyaraj, A. K. Kar, A. M. Baabdullah, A. Koohang, V. Raghavan, M. Ahuja, H. Albanna, M. A. Albashrawi, A. S. Al-Busaidi, J. Balakrishnan, Y. Barlette, S. Basu, I. Bose, and L. Brooks, ““So what if ChatGPT wrote it?” multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy,” International Journal of Information Management, vol. 71, p. 102642, 2023, doi: 10.1016/j.ijinfomgt.2023.102642.
[6] J. Rudolph, S. Tan, and S. Tan, “ChatGPT: Bullshit spewer or the end of traditional assessments in higher education?” Journal of Applied Learning and Teaching, vol. 6, no. 1, pp. 342–363, 2023, doi: 10.37074/jalt.2023.6.1.9.
[7] A. Herrmann-Werner, T. Festl-Wietek, F. Holderried, L. Herschbach, J. Griewatz, K. Masters, S. Zipfel, and M. Mahling, “Assessing ChatGPT’s mastery of Bloom’s taxonomy using psychosomatic medicine exam questions: Mixed-methods study,” Journal of Medical Internet Research, vol. 26, p. e52113, 2024, doi: 10.2196/52113.
[8] L.W. Anderson and D. R. Krathwohl, Eds., A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives. New York: Longman, 2001.
[9] J. Biggs and C. Tang, Teaching for Quality Learning at University, 4th ed. McGraw-Hill / Society for Research into Higher Education & Open University Press, 2011.
[10] F. D. Davis, “Perceived usefulness, perceived ease of use, and user acceptance of information technology,” MIS Quarterly, vol. 13, no. 3, pp. 319–340, 1989, doi: 10.2307/249008.
[11] V. Venkatesh, M. G. Morris, G. B. Davis, and F. D. Davis, “User acceptance of information technology: Toward a unified view,” MIS Quarterly, vol. 27, no. 3, pp. 425–478, 2003, doi: 10.2307/30036540. [12] OpenAI, “GPT-4 technical report,” OpenAI, Tech. Rep., 2023, arXiv: 2303.08774.
[13] C. K. Y. Chan and W. Hu, “Students’ voices on generative AI: Perceptions, benefits, and challenges in higher education,” International Journal of Educational Technology in Higher Education, vol. 20, p. 43, 2023, doi: 10.1186/s41239-023-00411-8.
[14] B. Memarian and T. Doleck, “ChatGPT in education: Methods, potentials, and limitations,” Computers in Human Behavior: Artificial Humans, vol. 1, no. 2, p. 100022, 2023, doi: 10.1016/j.chbah.2023.100022.
[15] M. Perkins, “Academic integrity considerations of AI large language models in the post-ChatGPT era: A call for university policies,” Journal of University Teaching & Learning Practice, vol. 20, no. 2, p. 07, 2023, doi: 10.53761/1.20.02.07.
[16] B. S. Bloom, M. D. Engelhart, E. J. Furst, W. H. Hill, and D. R. Krathwohl, Eds., Taxonomy of Educational Objectives: The Classification of Educational Goals. Handbook I: Cognitive Domain. New York: David McKay Company, 1956.
[17] G. Kurdi, J. Leo, B. Parsia, U. Sattler, and S. Al-Emari, “A systematic review of automatic question generation for educational purposes,” International Journal of Artificial Intelligence in Education, vol. 30, no. 1, pp. 121–204, 2020, doi: 10.1007/s40593-019-00186-y.
[18] O. Zawacki-Richter, V. I. Marin, M. Bond, and F. Gouverneur, “Systematic review of research on artificial intelligence applications in higher education—where are the educators?” International Journal of Educational Technology in Higher Education, vol. 16, p. 39, 2019, doi: 10.1186/s41239-019-0171-0.
[19] M. Farrokhnia, S. K. Banihashem, O. Noroozi, and A. Wals, “A SWOT analysis of ChatGPT: Implications for educational practice and research,” Innovations in Education and Teaching International, vol. 61, no. 3, pp. 460–474, 2024, doi: 10.1080/14703297.2023.2195846.
[20] I. Roll and R. Wylie, “Evolution and revolution in artificial intelligence in education,” International Journal of Artificial Intelligence in Education, vol. 26, no. 2, pp. 582–599, 2016, doi: 10.1007/s40593-016-0110-3.
[21] U. Lee, H. Jung, Y. Jeon, Y. Sohn, W. Hwang, J. Moon, and H. Kim, “Few-shot is enough: exploring ChatGPT prompt engineering method for automatic question generation in English education,” Education and Information Technologies, vol. 29, pp. 11 483–11 515, 2024, doi: 10.1007/s10639-023-12249-8.
[22] D. Baidoo-Anu and L. Owusu Ansah, “Education in the era of generative artificial intelligence (ChatGPT): Understanding the potential benefits and challenges,” Journal of AI, vol. 7, no. 1, pp. 52–62, 2023, doi: 10.61969/jai.1337500.
[23] T. Susnjak, “ChatGPT: The end of online exam integrity?” arXiv preprint, 2022, arXiv: 2212.09292.
[24] A. Tlili, B. Shehata, M. A. Adarkwah, A. Bozkurt, D. T. Hickey, R. Huang, and B. Agyemang, “What if the devil is my guardian angel: ChatGPT as a case study of using chatbots in education,” Smart Learning Environments, vol. 10, p. 15, 2023, doi: 10.1186/s40561-023-00237-x.
[25] M. Sallam, “ChatGPT utility in healthcare education, research, and practice: Systematic review on the promising perspectives and valid concerns,” Healthcare, vol. 11, no. 6, p. 887, 2023, doi: 10.3390/healthcare11060887. [26] D. J. Nicol and D. Macfarlane-Dick, “Formative assessment and self-regulated learning: A model and seven principles of good feedback practice,” Studies in Higher Education, vol. 31, no. 2, pp. 199–218, 2006, doi: 10.1080/03075070600572090.
[27] J. Hattie and H. Timperley, “The power of feedback,” Review of Educational Research, vol. 77, no. 1, pp. 81– 112, 2007, doi: 10.3102/003465430298487.
[28] P. Black and D. Wiliam, “Assessment and classroom learning,” Assessment in Education: Principles, Policy & Practice, vol. 5, no. 1, pp. 7–74, 1998, doi: 10.1080/0969595980050102.
[29] D. Turnbull, R. Chugh, and J. Luck, “Transitioning to E-learning during the COVID-19 pandemic: How have higher education institutions responded to the challenge?” Education and Information Technologies, vol. 26, pp. 6401–6419, 2021, doi: 10.1007/s10639- 021-10633-w.
[30] H. Fiock, “Designing a community of inquiry in online courses,” International Review of Research in Open and Distributed Learning, vol. 21, no. 1, pp. 134–152, 2020, doi: 10.19173/irrodl.v20i5.3985.
[31] F. Martin and D. U. Bolliger, “Engagement matters: Student perceptions on the importance of engagement strategies in the online learning environment,” Online Learning, vol. 22, no. 1, pp. 205–222, 2018, doi: 10.24059/olj.v22i1.1092.
[32] B. D. Lund and T. Wang, “Chatting about ChatGPT: How may AI and GPT impact academia and libraries?” Library Hi Tech News, vol. 40, no. 3, pp. 26–29, 2023, doi: 10.1108/LHTN-01-2023-0009.
Cite This Article
Choose your preferred format