Abstract
Rapid advancements in GenAI have revolutionized language learning, particularly in standard language test preparation, by providing learners with immediate, adaptive, and data-driven feedback on writing performance, enabling them to refine their linguistic accuracy, coherence, and argumentation through tailored guidance that closely mirrors the evaluation criteria of high-stakes assessments such as the IELTS. Therefore, this study compares language test scoring by human evaluators and GenAI for the IELTS Academic Writing Module. The research assessed four models—ChatGPT, Claude, Gemini, and DeepSeek—using 110 IELTS Task 1 and Task 2 essays and compared their performance to human markers through statistical analyses, including PCC, ICC, and MAE. The findings indicated varying degrees of alignment between GenAI and human scores, with DeepSeek exhibiting the highest correspondence with human assessments. A qualitative content analysis of the GenAI-generated comments on details, organization, and utility was performed. Notably, the ChatGPT provided comprehensive, well-organized, and actionable feedback. Claude offered a balanced approach, Gemini focused on strengths, and DeepSeek delivered more concise feedback, although less actionable. This study underscores the potential of GenAI as a supplementary tool for SDL in IELTS preparation and assessment while also highlighting limitations such as the depth of feedback and the risk of over-reliance on technology. Ethical considerations, including algorithmic bias, data privacy, and accessibility further underscore the need for human oversight in AI-driven assessments. Future research should investigate hybrid AI-human feedback models and conduct longitudinal studies on the impact of GenAI feedback on writing proficiency.
| Original language | English |
|---|---|
| Title of host publication | Creativity and New Technologies in Learning for the Workplace and Higher Education |
| Editors | David Guralnick, Michael E. Auer, Antonella Poce |
| Place of Publication | Cham |
| Publisher | Springer Nature |
| Chapter | 30 |
| Pages | 397-408 |
| Number of pages | 12 |
| ISBN (Electronic) | 978-3-032-09908-2 |
| ISBN (Print) | 978-3-032-09907-5 |
| DOIs | |
| Publication status | Published - 9 Jan 2026 |
Publication series
| Name | Lecture Notes in Networks and Systems |
|---|---|
| Publisher | Springer |
| Volume | 1702 |
| ISSN (Print) | 2367-3370 |
| ISSN (Electronic) | 2367-3389 |
Funding
The author would like to sincerely thank the Indonesian Endowment Fund for Education (LPDP) for their support and financial assistance in completing this conference proceeding article.
Keywords
- generative artificial intelligence
- language assessment
- Self-directed learning
- IELTS writing module
Fingerprint
Dive into the research topics of 'From Prompts to Scores: Generative AI Versus Human Grading in Writing for Standardized Language Testing'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS