ChatGPT as an Assessment Design Tool in Higher Education:

Evaluating Item Quality, Bloom’s Taxonomy Coverage, and

Faculty Acceptance Across Academic Disciplines

Nadia Iftikhar1,* Rabia Muslu2

1 UCAM Catholic University of Murcia, Spain

2 Amity University Dubai, UAE

Received: November 04, 2025 Revised: December 11, 2025 Accepted: January 18, 2026 ⋆ Corresponding author

ABSTRACT

The emergence of large language models capable of generating coherent, contextually grounded text at scale has

created a new and contested tool for higher education assessment design: instructors can now produce examination

questions, assignment prompts, and feedback rubrics in seconds rather than hours. Whether the items produced

by these systems meet the quality standards required for valid, reliable, and pedagogically appropriate higher

education assessment is an empirical question that the literature has only partially addressed. This paper reports

a three-study investigation of ChatGPT as an assessment design tool in higher education, covering item quality,

cognitive level coverage, student performance, and faculty acceptance. Study 1 presents an expert-panel evaluation of

360 assessment items—180 generated by ChatGPT and 180 created by experienced instructors—across six academic

disciplines and four item types, rated on seven quality dimensions including content accuracy, Bloom’s taxonomy

alignment, linguistic clarity, and originality. Study 2 reports a faculty survey of 186 instructors examining adoption

rates, perceived benefits, concerns, and the predictors of acceptance. Study 3 compares the performance of 412

students on counterbalanced ChatGPT-generated and instructor-created assessment items. ChatGPT-generated items

score significantly below instructor-created items on Bloom’s taxonomy alignment and originality, but perform

comparably or above on linguistic clarity and difficulty calibration. Student performance is modestly but significantly

higher on ChatGPT-generated items, a finding that challenges simple assumptions about AI-generated assessment

difficulty. Academic integrity concerns and higher-order cognitive coverage are the dominant faculty concerns,

while time savings—averaging 77% reduction in item-writing time—is the most consistently cited benefit. The

paper contributes a validated multi-dimensional item quality framework, a faculty acceptance model, and eight

evidence-based guidelines for the responsible integration of ChatGPT in assessment design workflows.

Keywords: ChatGPT Assessment design Exam questions Bloom’s taxonomy Item quality Faculty acceptance

Generative AI Higher education Artificial intelligence in education

1. INTRODUCTION

Assessment design is among the most time-intensive components

of university teaching. Creating a single wellconstructed

multiple-choice question that targets a specific

cognitive level, avoids item-writing flaws, and adequately

discriminates between students can take an experienced instructor

15–20 minutes [1]. Scaling that effort across a full