Case study on the capabilities of AI generative text detection
This study investigates the effectiveness of AI detection tools, specifically GPTZero and Turnitin, in identifying content generated by AI language models. The research was driven by the hypothesis that an AI detection tool should detect AI-written content approximately 99% of the time, a critical threshold for educators to make reliable and repeatable judgments on potential cases of academic misconduct. The study utilised an experimental research design, generating essays from AI language models and testing their detection rates. The essays were also paraphrased using an AI tool to simulate post-generation modifications. The findings present a nuanced picture, with detection rates ranging from 0% to 100% across different essays and detection tools. The study introduces the terms "False Negatives" and "False Positives" to describe specific outcomes of the detection process. The results underscore the complexity of reliably detecting AI-generated content and highlight the need for further research and development in AI detection tools. The study concludes with the open question of the reliability of any AI detection tool given the inherent variability in AI-generated content. These early findings provide initial insights into the performance of AI detection tools and the challenges in this field.
History
Accessibility status
- Has passed accessibility checks