Artificial Intelligence detectors (AI) might present scientific-looking percentages or scores, but nobody should treat those results as fact. There are far too many cases of erroneous flagging of human-written texts. Concerns over detecting false positives have been more publicized in higher education, but remain worrisome for educators at all levels.
Many teachers think AI detectors are a silver bullet and can help them do the difficult work of identifying possible academic misconduct. Well, they are wrong. The reality is that these products are not perfect. My favorite example of just how imperfect they can be is when GPTZero claimed the U.S. Constitution was written by AI.
Recently, a Reddit user posted a concern that has since gained traction. This user described how, after extending their original human-written essay with content from ChatGPT, they decided to check the authenticity of their work using an AI detector. While they anticipated the mixed content would trigger an AI flag, they did not expect their 100% human-written essay to be flagged as “partially written by AI.” This revelation ignited a flurry of responses from fellow Redditors who overwhelmingly highlighted their experiences involving the inaccuracies of these AI detectors.
The consensus among the responses was as clear as day. AI detectors have a long way to go in terms of accuracy. One commenter candidly stated, “No AI detectors are even close to accurate.” Another highlighted the possibility of false positives, suggesting the use of Google Docs to retain version histories of written papers. In their own view, doing so provides evidence of a paper’s evolution.
Many Redditors noted that the detection is nothing to worry about because everyone knows that GPT detectors are not accurate. They claim that detection tools can only raise suspicion but not actually be used to accuse a student of anything. The majority are of the opinion of having some kind of evidence of their work just in case they experience a similar problem. On the humorous side, one comment quipped, “ChatGPT! Is that you?” hinting at the irony of the situation.
But Why Do AI Detectors Flag Human-Written Text?
To begin with, what do we mean by saying false positives? A false positive is when an AI detector incorrectly identified human-created content as being likely generated by an AI. Tools like Turnitin and GPTZero suffer from false positives that can accuse innocent students of cheating.
In June this year, Turnitin reported that on a sentence-by-sentence level, its software erroneously flags 4% of writing as being AI-generated. There is a higher incidence of these false positives in cases where Turnitin detects that less than 20% of a document is AI-generated.
One of the primary reasons AI detectors flag human-written content is their programming to identify patterns. Here, modern language models like GPT-4 are trained on vast amounts of text, which encompasses many of the common structures and styles seen in human writing. Therefore, a well-structured, grammatically correct essay might raise flags for these detectors. The dangers of false positives accentuate the need for better calibration and setting appropriate benchmarks for such tools. Besides, it makes sense that as AI models become more advanced, their generated content increasingly mirrors human writing. Consequently, the detectors try to be thorough and may sometimes overstep by identifying genuine human creativity and style as machine generated.
Are There Fake AI Content Detectors Out There?
As with any technology, the market is rife with genuine and counterfeit products. We are all in the business of making money. While you may opt for the legal and honest way to make money, there are crooks who would do anything to make big bucks, even at the detriment of others. That said, some detectors promise accurate results without the technological backing to deliver, which leads to false positives or negatives. Try surfing the Internet and you will find many startups and solutions claiming to have the upper hand in AI detection.
This conundrum emphasizes the need for thorough research and perhaps institutional validation before incorporating these detectors into academic evaluation. Without standardized benchmarks and validation criteria, institutions and users struggle to sift through the noise. So, it is important to approach these tools with a healthy dose of skepticism. Always prioritize established and peer-reviewed solutions over flashy, unsubstantiated claims.
Students and Professors Should Be Aware of False Positives
Awareness is the first step towards resolution. Both students and professors should be informed about the limitations of AI detectors. This cognizance will ensure that they approach the results with a discerning eye. They will be able to prioritize manual evaluations and conversations over blind trust in technology. Importantly, teachers should be using AI reports as resources, not deciders, and educators always make final determinations.
Academic integrity is the centerpiece of educational institutions worldwide. Students should produce original work to ensure they can understand and apply the knowledge they acquire. Nonetheless, we understand the evolution of AI tools like ChatGPT has made it possible to generate human-like content. This capability blurs the lines between human and AI-generated work. As such, the role of AI detectors cannot be overstated, though they are controversial in upholding this integrity.
So, let’s Address the Reddit User’s Concerns
For those worried about their essays being mistaken as AI-generated, you are not alone. There are various ways to deal with this problem. For starters, try to maintain transparency and discuss any doubts directly with your professors. As for the accuracy of tools like GPTZero, just know that they can give false positives and negatives. While they have achieved significant milestones, they are not infallible. Avoid writing like a “bot” by focusing on maintaining a unique voice and injecting personal experiences and perspectives into your writing.