Pangram detects GPT-5 with 99.8%+ accuracy! Learn more
The AI Detection Market today consists of several large players. You may have heard of them: Pangram, GPTZero, Turnitin, ZeroGPT, and more.
Many of these companies routinely update their models and publish numbers on their performance. Recently, GPTZero launched a summer model update and released new numbers for their performance on a variety of new models. In this blog post, we will compare the performance of GPTZero's new model with Pangram's AI detection including the latest GPT-5 models.
Model | Pangram Detection Rate | GPTZero Detection Rate | Better Detector |
---|---|---|---|
GPT-5 | 99.81% | 95.0% | Pangram |
GPT-5-chat-latest | 99.97% | Untested | N/A |
GPT-5-mini | 99.92% | 92.2% | Pangram |
GPT-5-nano | 99.97% | 96.1% | Pangram |
GPT-OSS-120b | 100.00% | Untested | N/A |
GPT-OSS-20b | 99.74% | Untested | N/A |
GPT4.1 | 99.48% | 96.8% | Pangram |
GPT4.1-mini | 99.94% | 98.7% | Pangram |
o3 | 99.86% | 89.9% | Pangram |
o3-mini | 100.00% | 98.4% | Pangram |
Gemini 2.5 Pro | 99.91% | 95.7% | Pangram |
Gemini 2.5 Flash | 99.75% | 98.2% | Pangram |
Claude Sonnet 4 | 99.91% | 99.1% | Pangram |
Note: GPTZero does not release their internal evaluation datasets to the public, so these numbers are not from the same exact documents. Furthermore, GPTZero does not release the number of documents they test on, so we cannot compare quantity either. However, for Pangram’s performance numbers, we evaluated on thousands of documents per model as well as a wide variety of domains and prompt schemes to simulate real-world use.
Furthermore, Pangram’s accuracy is not limited to flagging the most AI documents. Pangram is also the market leader in keeping low false positive rates. It’s a serious priority for us to not flag human-written documents as AI-generated. Below outlines the difference of the reported False Positive Rates for Pangram and GPTZero:
Pangram | GPTZero | |
---|---|---|
False Positive Rate (%) | 0.01% | 1% |
False Positive Rate (#) | ~1 in 10,000 documents | ~1 in 100 documents |
GPTZero False Positive Rate Blog Post
Here we see GPTZero’s performance reporting False Positive Rate (FPR) at 1%.
Pangram and GPTZero have also come head-to-head in peer-reviewed AI research papers. This is best represented in the recent University of Maryland study “People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text.” This study investigated the ability for expert human annotators to classify the difference between human and AI-generated text.
As part of the study, the human annotators were benchmarked against commercially available and open-source detectors. Pangram performed better than each individual human detector, as well as better than all of the commercial alternatives, including GPTZero.
GPT-4o | Claude | |
---|---|---|
Pangram | 100% | 100% |
GPTZero | 100% | 97.6% |
Annotator 1 | 96.7% | 100% |
Annotator 2 | 96.7% | 100% |
Annotator 3 | 86.7% | 80% |
Annotator 4 | 90.0% | 96.7% |
Annotator 5 | 93.3% | 93.3% |
The differences between Pangram’s flagship model and GPTZero don’t end there. Both models are “multilingual”, meaning they are able to detect AI across languages more than just english. Pangram is multilingual across all of the top 20 languages on the internet. GPTZero supports English, French, and Spanish. Here are the languages that each model is tested in:
Language | Pangram False Positive Rate (FPR) | GPTZero False Positive Rate (FPR) | Pangram AI Detection Rate | GPTZero AI Detection Rate |
---|---|---|---|---|
Spanish | 0.00% | 5.6% | 100.0% | 96.4% |
French | 0.00% | 3.1% | 100.0% | 93.1% |
Arabic | 0.10% | Untested | 100.0% | Untested |
Czech | 0.00% | Untested | 99.89% | Untested |
German | 0.00% | Untested | 99.68% | Untested |
Greek | 0.00% | Untested | 99.79% | Untested |
Persian | 0.00% | Untested | 100.0% | Untested |
Hindi | 0.00% | Untested | 99.58% | Untested |
Hungarian | 0.10% | Untested | 99.05% | Untested |
Italian | 0.00% | Untested | 100.0% | Untested |
Japanese | 0.00% | Untested | 100.0% | Untested |
Dutch | 0.10% | Untested | 100.0% | Untested |
Polish | 0.00% | Untested | 100.0% | Untested |
Portuguese | 0.00% | Untested | 100.0% | Untested |
Romanian | 0.10% | Untested | 100.0% | Untested |
Russian | 0.00% | Untested | 100.0% | Untested |
Swedish | 0.00% | Untested | 99.89% | Untested |
Turkish | 0.00% | Untested | 99.79% | Untested |
Ukrainian | 0.00% | Untested | 99.89% | Untested |
Urdu | 0.00% | Untested | 98.84% | Untested |
Vietnamese | 0.00% | Untested | 99.89% | Untested |
Chinese | 0.00% | Untested | 99.89% | Untested |
For more information on Pangram’s performance on Multilingual text, see this blog post
Additionally, both models are trained with close attention to ESL performance, as there is a widely-known fear that AI detectors may be biased against non-native english speakers. Both GPTZero and Pangram have published results on ESL text in particular. See how they stack up below:
False Positive Rate | Sample Size | |
---|---|---|
Pangram | 0.032% | 25,021 |
GPTZero | 1.1% | 91 |
To read more about Pangram’s approach to ESL text, check out this blog post https://www.pangram.com/blog/how-accurate-is-pangram-ai-detection-on-esl
Another concern for those in the market for AI detection is performance on unreleased models. As the AI wars continue to expand, large AI labs and small upstarts release important models on the regular. It’s important for an AI detection solution to continue to provide accurate results on models that they might not have been able to train directly on.
The recent release of GPT-5 provided a great opportunity to figure this out! Within hours of the new model release, the Pangram team tested the performance of GPTZero and Pangram on a variety of prompt types. Here’s how they did:
Pangram | GPTZero | |
---|---|---|
Document 1 | 100% | 2% |
Document 2 | 100% | 0% |
Document 3 | 100% | 0% |
Document 4 | 100% | 0% |
Document 5 | 100% | 9% |
Document 6 | 99% | 0% |
Document 7 | 100% | 0% |
Document 8 | 100% | 0% |
Document 9 | 100% | 29% |
Document 10 | 100% | 0% |
Document 11 | 100% | 10% |
Note: GPTZero has since released a model update that claims to perform better on GPT-5! For more details on our original comparison, please see [this blog post] (https://www.pangram.com/blog/gpt-5). Additionally, we encourage users to complete their own tests to compare performance at any given point.
In the end, Pangram continues to be the robust and reliable choice for detecting AI-generated content. Whether your needs are for education, publishing, content moderation, or something even more unique, we're here with accurate and fair AI detection. Learn more on our blog or reach out at info@pangram.com.