Pangram detects Gemini 3! Learn more

How well does Pangram perform on humanizers? (Updated August 2025)

Bradley Emi

August 27, 2025

How well does Pangram perform on humanizers? (Updated August 2025)

AI detection is often described as an "arms race" between large language models, detectors, and "humanizers," which are a class of tools online that are meant to obfuscate AI-generated text and introduce intentional errors in order for the resulting text to sound human.

At Pangram, we are always trying to stay ahead of the curve, and reacting to the latest technology advancements in both new models and humanizers. This allows us to create AI detection that remains reliable.

In January 2025, we published an update to our technical report where we audited 19 different humanizers and paraphraser tools. The core findings were:

Pangram is robust to humanizers and paraphrasers
Some humanizers manually add errors in spacing and punctuation and do one-to-one synonym replacement.
Other humanizers are fine-tuned LLMs themselves that are trained to paraphrase the text in a human style.
The more readable/fluent the humanizer text is, the more likely it is to be detected by Pangram.
In other words, the "good" humanizers in terms of fluency are more detectable, and the "bad" humanizers are less likely to be detectable.

However, the humanizer landscape is evolving rapidly, and so we wanted to publish updated numbers on our latest humanizer benchmark.

Pangram's updated humanizer results

Humanizer	Accuracy
Ahrefs	100.0%
aihumanizer.com	100.0%
Bypass GPT	99.7%
DIPPER	97.6%
Ghost AI	100.0%
GPTinf	99.2%
Grammarly	100.0%
humanizeai.io	93.8%
humanizeai.pro	100.0%
Just Done	93.5%
Quillbot	100.0%
Scribbr	99.0%
Semihuman AI	100.0%
Smodin	100.0%
StealthGPT	95.6%
Surfer SEO	100.0%
surgegraph.io	100.0%
TwainGPT	92.7%
Undetectable AI	90.3%
Writesonic AI	98.1%

Pangram performs above 90% on all the notable humanizers that we tested.

How does Pangram compare to other AI detectors on humanized AI text?

In Russell et. al., Pangram is benchmarked against GPTZero and several open source methods on humanized text. Pangram's best model is 97% accurate on humanized text, compared to GPTZero at 46%, FastDetectGPT at 23%, and Binoculars at 7%.

Pangram's performance on humanized text compared to other detectors

A very recent study by Jabarian and Imas found that Pangram is the only detector among 4 commercial detectors whose performance is robust to humanizers:

For longer passages, Pangram detects nearly 100% of AI-generated text. The FNR increases a bit as the passages get shorter, but still remains low. The other detectors are less robust to humanizers. The FNR for Originality.AI increases to around 0.05 for longer text, but can reach up to 0.21 for shorter text, depending on the genre and LLM model. GPTZero largely loses its capacity to detect AI-generated text, with FNR scores around 0.50 and above across most genres and LLM models. RoBERTa does similarly poorly with high FNR scores throughout.

How can you tell if a text has been humanized?

There are several ways that you can tell by eye that a text has been fed through a humanizer.

Tortured Phrases

One of the easiest ways that you can spot a humanizer is by looking for "tortured phrases", which are out of place synonym replacements meant to disguise plagiarism. Word spinner tools, such as Grammarly and Quillbot, have been using these synonym replacement algorithms even before AI to disguise plagiarism.

Examples of tortured phrases would be "counterfeit consciousness" instead of "artificial intelligence", or "bosom peril" instead of "breast cancer." We heard a funny case last year of "Martin Luther Ruler, Jr." showing up in a student essay in place of "Martin Luther King, Jr."

It is important to be careful of using tortured phrases as the only way to spot humanized AI text, because tortured phrases also commonly show up in nonnative English writing when nonnative speakers misuse or misinterpret the direct meaning or typical way that certain words are used.

Unnatural Spacing Errors

Humanizers often try to fool the tokenizer of the AI detectors by adding or removing spaces. Especially common are space removals between sentences.

Repetitive Phrases

Humanized AI text still exhibits the same repetitive phrases as non-humanized AI text. It is especially telling that text came from a humanizer if the same tortured phrase appears twice in the same document, as it is evidence that the humanizer is systematically applying the same synonym replacements.

Non-standard Characters

Humanizers also typically use non-standard Unicode characters in order to fool the tokenizers of AI detectors as well. An example of this is a popular humanizer that uses "U+2009", which is the unicode character for "thin space" instead of a normal space. We recommend this website https://www.soscisurvey.de/tools/view-chars.php which allows you to see all non-printable characters that may be hidden in copy and pasted strings.

Example of non-printable characters in humanized text

Writing Process Tools

Using Pangram's new Writing Playback feature in Google Docs, you can also check to see if a significant portion of the text in a Google doc was copied and pasted rather than manually typed in. You can find a more detailed explaination of AI detection in Google Docs here.

Example of writing playback showing copy and paste

Why is Pangram not 100% accurate on humanized AI text?

There are several reasons why Pangram is not a perfect detector on humanized AI text.

Pangram is not willing to compromise on its False Positive Rate. Several of our internal models are able to detect humanizers with near-perfect accuracy, but exhibit higher false positive rates. We do not ship these models because it is more important to us that genuine human writing never gets flagged as AI than catching all humanizer outputs.
Extremely low-quality "junk" text is easily detectable by eye. Most of the cases in which Pangram does not catch humanized output, the text is so badly garbled and obfuscated that it barely resembles English. These cases are easy to spot by eye, but are hard to catch algorithmically because there are infinitely many ways to produce gibberish. We would rather descope gibberish than try to detect it, as it is not even well-posed to try to distinguish human gibberish from humanizer gibberish.

Can I expect Pangram to improve its humanizer performance over time?

Yes, humanizer detection is an active area of research for Pangram and we hope to continue to characterize the properties of these humanizers and publicize our research into detecting humanizer outputs. If Pangram is to be seen as a reliable tool in academic integrity, we must be able to detect text produced by these cheating tools as well as text directly copied and pasted from large language models.

Subscribe to our newsletter

We share monthly updates on our AI detection research.

Subscribe
to our updates

Stay informed with our latest news and offers.

Pangram.

soc2

SOC2 TYPE2

Verified by AssuranceLab

© 2025 Pangram. All rights reserved.

info@pangram.com

Join our Community

© 2025 Pangram. All rights reserved.