📣 Pangram 3.0 with AI assistance detection is here! Try it now or learn more.

Third-Party Pangram Evaluations

Destiny Akinode

November 4th, 2025

We believe it’s important that institutions can rely on Pangram’s high accuracy, therefore we encourage third-party verification on our quality metrics (false-positives & false-negatives). Below, we’ll highlight evaluations of Pangram from researchers at University of Chicago (UChicago) and University of Maryland (UMD) and commercial reviewers.

Key Takeaway: Pangram’s internal testing holds up to scrutiny from third-parties.

Pangram’s Reliability & Accuracy (University of Chicago)

Experiment

At UChicago’s Becker Friedman Institute for Economics, researchers compared four AI detectors: Pangram, GPTZero, Originality AI, and RoBERTa (an open-source AI detector). The study used each detector to analyze 1,992 human text(s) written pre-2020 and 1,992 AI-generated texts across different genres and word counts. They looked at two types of errors in AI detection: False Positive Rates and False Negative Rates. These rates were compared for multiple thresholds. The detectors also classified AI generated text from popular LLMs like ChatGPT, Claude and Gemini. Researchers created multiple FPR Policy Caps among detectors to note the changes in FNR.

Results

From the study, Artificial Writing and Automated Detection by Brian Jabarian and Alex Imas on August, 2025:

Pangram dominates the other detectors across all thresholds.

Pangram is the only detector that meets a stringent policy cap (FPR ≤ 0.005) without compromising the ability to accurately detect AI text.

Pangram remains the low-cost leader in all genres and on average: $0.0228 per correctly flagged AI passage versus $0.0416 for OriginalityAI and $0.0575 for GPTZero, making Pangram the most cost-efficient detector for both full-length passages and stubs.

The study showed that:

Pangram achieves essentially zero false positive rates and false negative rates on medium-length to long passages.

Pangram’s high accuracy was acclaimed across different genres of text such as: blogs, reviews, resumes, news and novels. In shorter-form text, the false positive and false negative rates slightly increase “but remain well below reasonable policy thresholds".

UChicago’s researchers pointed out Pangram’s superior performance compared to other available AI detectors. When given an FPR cap of 0.0001, "neither GPTzero nor Originality.AI do very well under the most stringent FPR policy cap. . .Pangram still achieves an FNR rate of around 0.01 on most LLM models."

Pangram no longer predicts for text under 50 words, but as noted in the study,

Pangram’s performance largely holds up on very short passages (< 50 words) and is robust to “humanizer” tools (e.g., StealthGPT), the performance of other detectors becomes case-dependent.

Pangram’s Performance against Humanizers (University of Maryland)

Experiment

In Experiment 1 of this UMD study, annotators with various levels of knowledge on LLMs were used to predict whether or not a text was AI-generated. After observing that one annotator was nearly perfect at identifying AI text, four additional expert annotators with similar backgrounds in LLM usage were used to classify the same sample of 60. The results from expert votes were compared with commercial detectors like Pangram, Pangram Humanizer, and GPTZero, as well as open source tools like Fast-DetectGPT. During this process, Pangram as compared to other detectors.

Pangram's consisted performance against paraphrased and humanized text

Results

Pangram can accurately detect humanized AI-generated text. This is corroborated by computer scientists at UMD who have noted that Pangram scored highest overall for detecting humanizers and paraphrased text, outperforming other AI detection software with 99.3% accuracy.

Learn more about how Pangram holds up against humanizers

Pangram Evaluations Outside of Research Institutions

Amanda Caswell at Tom’s Guide stated in an article that after trying dozens of AI detection tools, Pangram “outperformed the others I tried”. Pangram was also shown to be diligently working on reducing the already low incidents of false positives.

David Gewirtz at ZDNET describes Pangram as “a newcomer to our tests that immediately soared into the winners' circle.”

Because AI usage in research papers has increased, there is a concern that this is an indicator of misconduct. Adam Day’s Medium article used Pangram’s AI detection for reliable results on the prevalence of AI content while also concluding that there are legitimate use cases for generative AI in research. Day recommends using Pangram to conduct research, saying: “if someone wants to do a survey of genAI usage in the published literature, I think there’s a great opportunity to do that with Pangram’s tools.”

Using Pangram’s Results in Reputable Research (University of Maryland)

UMD researchers (in collaboration with Microsoft and Pangram) have used the results from Pangram’s AI detection in a recent study to analyze the presence of AI-generated text in the news using a sample of 186,000 newspaper articles. While a low percentage of news was found to be AI-generated, the use of AI was not disclosed. Pangram was used to identify "219 articles containing AI content on the opinion pages of The New York Times, The Wall Street Journal and The Washington Post."

The study was able to point out nuances in AI usage such as:

Reporters who write their own articles may not be aware that the people they quote in their articles used AI to create their response.

AI in the news using pangram's detection

Conclusion

At Pangram, we believe transparency is essential to trust. We’d love to partner with you to bring AI transparency to your organization.

Subscribe to our newsletter

We share monthly updates on our AI detection research.

Subscribe
to our updates

Stay informed with our latest news and offers.

Pangram.

soc2

SOC2 TYPE2

Verified by AssuranceLab

© 2025 Pangram. All rights reserved.

info@pangram.com

Join our Community

© 2025 Pangram. All rights reserved.