Pangram detects GPT-5 with 99.8%+ accuracy! Learn more
TLDR; Academics and content moderators should re-assess their evaluations of AI detect tools or AI checkers. The prior consensus formed in 2023 from academic papers and popular media is outdated. The latest AI detection models, like Pangram, have not only caught up to the latest LLMs, but are being built as future proof.
How did we get here: A brief history of AI detectionWhen ChatGPT launched in 2022, writers and content creators flocked to the new tool to create. It hasn’t stopped since. Creators leveraged ChatGPT and its competitors to write everything from large documents like research papers and novels to small snippets like emails, Reddit comments, and Amazon reviews. However, since LLMs inception, there has been demand to separate what is human-written and what is AI written. OpenAI actually saw this need and created a product that would classify text as AI. As AI adoption rose, demand for AI detectors or classifiers increased, particularly at schools and universities where academic integrity was paramount. Some students (who are often early adopters) used the latest models to finish assignments, take tests, and apply for college. Some researchers, under time pressure, cut corners and submitted AI-written or assisted work for publications. Therefore, many tools were launched in hopes of addressing these concerns. Academic software incumbents like TurnItIn launched a tool in April 2023 called AI Checker to cater to existing education customers. Grammarly also launched its own tool in 2024 called Grammarly Authorship. The prevailing thought of these companies was that if their tools could identify plagiarism, they should also identify AI. However, it was clear early on this wasn’t going to work.
Early AI detectors promised accuracy by using perplexity and burstiness. Pangram’s CTO, Bradley Emi explains these terms: “Perplexity is how unexpected, or surprising, each word in a piece of text is. Burstiness is the change in perplexity over the course of a document. If some surprising words and phrases are interspersed throughout the document, then it has high burstiness.”
We won’t go get tied up in these terms, but leaning on these factors too heavily when creating an AI detection tool results in three consistent flaws:
These are real concerns for academic institutions specifically. The weight of falsely accusing students and researchers of using AI in their studies and papers is enormous. It can ruin careers. It’s incredibly risky to rely on tools with even 95% effectiveness for filtering out AI. Therefore, many top academic institutions like MIT, Vanderbilt, and UC Berkeley do not support their teachers using AI detectors. In many instances, they cited specific research papers that outlined how badly AI detectors perform like Testing of Detection Tools for AI-Generated Text, and industry articles such as Why AI writing detectors don’t work.
OpenAI found these issues so difficult they gave up on their AI Text Classifier in July 2023 citing “the AI classifier is no longer available due to its low rate of accuracy.” Many school administrators came to a conclusion: If OpenAI can’t do it, it’s probably impossible.
While top universities and the general public reached a consensus that the promise of AI detection was impossible or even snake oil, companies like Pangram Labs built significant improvements in the space that make AI detection a key tool in university and enterprise settings.
Why AI detection is different in 2025
AI detection is often referred to as an arms race between students looking for short-cuts and educators looking to filter out what’s human-written and what isn’t. In 2025, The detectors have upped the stakes.
In August 2025, two Chicago Booth researchers Brian Jabarian and Alex Emi published a paper, Artificial Writing and Automated Detection, stating that “most commercial AI detectors perform remarkably well, with Pangram in particular achieving a near zero False Positive Rates and False Negative Rates” They call out Pangram as being “the only detector that meets a stringent policy cap (False Positive Rates ≤ 0.005) without compromising the ability to accurately detect AI text.” This is an example of how far AI detection has come in a few short years. But how did this happen?
First, AI researchers have improved data sets by collecting a wider range of human texts and AI generated text. This includes not only academic papers, but other writings like emails and articles. Secondly, developers have employed active learning to reduce false positive rates. That means they go out and find the hardest text to classify as AI or human, then reintegrate it back into their models.
And in the arms race, Gen AI creators haven’t responded enough to break some AI detectors. When OpenAI’s much hyped GPT-5 was released, it promised reduced hallucinations, enhanced tone, and more creative writing. Within 12 hours, Pangram Labs Co-founder, Max Spero, posted on LinkedIn that without any additional training, Pangram’s AI detection tool could classify GPT-5 tests at a similar rate as prior models:
“Pangram is the only AI detector that reliably is able to detect GPT-5 without being explicitly trained to do so.”
Institutions are catching up to the new reality
There are genuine concerns on the use of AI detectors. Many of them still have alarming false positive rates and falsely advertise their accuracy. However, some of the latest technology is incredibly reliable and is actively being integrated into enterprise businesses and universities. For example, the expert sourcing company Qwoted recently integrated AI detection into their workflow to reduce AI-written quotes from ‘experts.’ “The future of journalism depends on trust. This is why we’re delighted to be partnering with Pangram, which has set the gold standard for AI detection and attribution.”
Researchers & journalists are also coming back into the fold. Long-time critics are changing their priors and investigating ways to incorporate AI detection into a wider AI policy. Rob Waugh of Press Gazette recently recommended Pangram for users looking to spot AI-generated writing. “Such tools are not 100% reliable, but Pangram has been rated as accurate compared to other online AI checkers, and is integrated into journalist response services such as Qwoted to detect AI-generated pitches and copy.”
We’re interested in discussing your use case and if Pangram could be valuable for your organization. Test us out and contact us about our enterprise offerings.