model

Pangram 3.3

May 13, 2026

Elyas Masrour · Max Spero · Bradley Emi · Katherine Thai

We launched this model from the Brooklyn Navy Yard.

We launched this model from the Brooklyn Navy Yard.

Bugfix: On May 18, 2026, we shipped a bugfix as Pangram 3.3.2. Users may notice changes in a small (<3%) percentage of predictions, but the overall performance of the model is improved.

Note: On May 15, 2026, we shipped a minor update to Pangram 3.3. This version, Pangram 3.3.1, replaces Pangram 3.3 in all of our products. The underlying model is the same for both Pangram 3.3 and 3.3.1. The only difference is in the algorithm we use to segment longer documents. Only documents of over 450 words will be affected by this change.

Model Description

Pangram 3.3 is an AI detection model released by Pangram Labs in May 2026. It is the successor to Pangram 3.2, with improvements to recall on the latest LLMs (Claude 4.7, GPT 5.4+), improvements to recall on humanized texts, improvements to recall on long-form (2,000+ word) AI-generated content, and a reduction in false positives on non-native English writing.

As with all Pangram 3 series models, it uses technology from the EditLens technical report.

Check out the full annoucement on the blog here.

Inputs

No change from Pangram 3.2.

Outputs

The ai_assistance_score field in API results is still a number between 0 and 1, but will now be normalized to be more easily interpretable. The numbers can be interpreted as follows:

+<=0.25: Human
+Between 0.25 and 0.5: Lightly AI-assisted
+Between 0.5 and 0.75: Moderately AI-assisted
+>= 0.75: AI

Supported Languages

No change from Pangram 3.2.

Model Training

No change from Pangram 3.2.

Model Inference

No change from Pangram 3.2.

Evaluations

Methodology

+
In-domain test set evaluation — held-out examples from the same datasets used for training, useful for understanding model fit.
+
Out-of-domain evaluation — completely held-out sources and domains, useful for understanding generalization.
+
External benchmarks — useful for field comparison, but should not be trusted as a current measure once released, as benchmarks can be trivially trained on.

Human Datasets — False Positive Rate

Dataset	FPR	N
Academic Writing (English)	0.02%	62,971
Academic Writing, Google Translated	0.17%	600
Amazon Reviews (Multilingual)	0.01%	10,425
News (Multilingual)	0.03%	100,199
Creative Writing, long-form, English	0.01%	10,495
Poetry	0.49%	12,769
Biomedical Research Papers	0.01%	65,053
How-To Articles, Multilingual	0.04%	166,194
AWS Documentation	0.01%	11,652
Speeches	0.00%	1,058
Movie Scripts	0.00%	9,989
Recipes	0.02%	22,421

AI Datasets by Domain — False Negative Rate

Dataset	FNR	N
Academic Writing	0.00%	48,443
Creative Writing	0.23%	41,940
Chatbot Arena, Random Sample	1.50%	2,536

3rd Party Benchmarks

Dataset	Accuracy	FPR	FNR	N
Liang et al. (2023): Nonnative English	100%	0%	—	91
Russell et al. (2024): Human Detectors	100%	0%	0%	300
Dugan et al. (2024): RAID, Random Sample	99.26%	0.01%	0.93%	66,855

Changes from Pangram 3.2

+Increased recall on the latest LLMs (Claude 4.7, GPT 5.4+).
+Increased recall on humanized texts.
+Increased recall on long-form (2,000+ word) AI-generated content. Long AI documents that Pangram 3.2 occasionally classified as "mixed, with later segments coming back as human, are more consistently classified as fully AI.
+Reduced false positive rate on non-native English writing.
+No degradation in the overall false positive rate.

Intended Usage and Limitation

As with the previous Pangram version, Pangram 3.3 is intended to be used on long-form writing samples in complete sentences.

Bullet point lists, instructions and technical manuals, tables of contents, reference sections, templated or automated writing, and dense mathematical equations are more susceptible to false positives than other domains.

For best results, human-written instructions, headers, footers, and other extraneous formatting should be removed before checking for AI-generated text.

Raw text and .docx files are recommended input file formats over PDFs when available, as PDF parsing may introduce unintended artifacts to the text.

Ethics & Safety

False accusations of AI usage can lead to serious consequences, including reputational damage, emotional trauma, and other undue harm.

We acknowledge that our model has a non-zero error rate and its errors may result in such harms. We commit to continuing to engage with our users and the academic community to educate others on appropriately contextualizing and communicating the results of AI detection software.

We take reports of false positives extremely seriously and work to mitigate their occurrences to the best of our ability.

We commit to training and releasing models with the lowest possible false positive rate, and improve our evaluations to continue to thoroughly test and monitor future model releases for regressions.

Pangram Labs — Model Card for Pangram 3.3

Subscribe
to our updates

Stay informed with our latest news and offers.

soc2

SOC2 TYPE2

Verified by AssuranceLab

© 2025 Pangram. All rights reserved.

info@pangram.com

Join our Community

© 2025 Pangram. All rights reserved.