Instantly know what's human and AI on Twitter, LinkedIn, Substack and more. Get our new chrome extension.

Product Updates

Meet Pangram 3.3!

May 13, 2026
Meet Pangram 3.3!

Today, we’re releasing Pangram 3.3. Like previous models in the Pangram 3 series, Pangram 3.3 is based on the EditLens architecture we presented in our ICLR 2026 paper.

View Pangram 3.3's model card here

What to Expect

In the last few weeks, you may have noticed that some text from the latest OpenAI and Anthropic releases has been incorrectly flagged as human. For this update, we focused on reducing our false negative rate, or the rate at which a model incorrectly labels AI-generated text as human, on content written by these newly released models.

As always, we are committed to maintaining our industry-leading false positive rate. We will never release a model that reduces our overall false negative rate at the cost of misclassifying more human-written text as AI-generated. You should see no increase in false positives with Pangram 3.3.

In addition to improving the false negative rate for models like Claude 4.7 and GPT 5.4+, Pangram 3.3 also has better performance on humanized text, long-form documents, and ESL writing benchmarks.

What's Improved

Detection of latest LLMs

Pangram 3.3 is much better than its predecessor at detecting pure outputs from the latest generation of LLMs, including Claude 4.7, GPT 5.4+. In our internal evaluations, we see a 3x improvement for detection of GPT-5.5 Pro generated text and more than a 4x improvement for Claude Opus 4.7 compared to Pangram 3.2.

Further improvement on humanizer detection

Pangram 3.3 shows marked improvement on humanizer evaluations, catching twice as many commercially humanized texts as its predecessor. Pangram 3.3 is also better at catching adversarially prompted LLM outputs, where users instruct the LLM to evade detection: we observed a 3x improvement on our internal adversarial dataset over the previous Pangram model.

Long document recall

Our previous model sometimes misclassified longer AI-generated documents (over 2000 words) as mixed, in particular mislabeling segments toward the end of the text as fully human. Pangram 3.3 significantly reduces this classification error for long, synthetic texts.

What's Next

AI translation detection

While our overall false positive rate has decreased due to improvements in challenging domains like poetry, we observed a slight uptick in the false positive rate for human-written text that has been passed through Google Translate. We know that translation is a popular use case for LLMs, and we are experimenting with ways to both model and report results for AI-translated text in future models.

Improved identification of AI-assistance

Agent use has exploded over the last six months. We're beginning to see human-AI writing processes evolve into a collaborative model, where multiple rounds of iteration entangle human-written and AI-generated text within a document. A major focus of ours is improving our modeling of this kind of co-written document. We’re excited to build on EditLens, in order to bring you the most accurate results on mixed-author text, as well as to enable users to understand what it means for text to be “lightly” versus “moderately” AI-assisted.

Model Card

As with our previous two models, you can always check out the current model’s performance on different domains and datasets in our model card.


Katherine Thai
Katherine ThaiFounding AI Research Scientist

Katherine Thai is the Founding AI Research Science at Pangram Labs, an AI detection startup. She completed her PhD in Computer Science under the supervision of Mohit Iyyer at the University of Massachusetts Amherst in December 2025, where her work was focused on evaluating LLMs on tasks related to literary analysis.

More from Katherine Thai