model
Pangram 3.3
May 13, 2026Elyas Masrour · Max Spero · Bradley Emi · Katherine Thai
Model Description
Pangram 3.3 is an AI detection model released by Pangram Labs in May 2026. It is the successor to Pangram 3.2, with improvements to recall on the latest LLMs (Claude 4.7, GPT 5.4+), improvements to recall on humanized texts, improvements to recall on long-form (2,000+ word) AI-generated content, and a reduction in false positives on non-native English writing.
As with all Pangram 3 series models, it uses technology from the EditLens technical report.
Inputs
No change from Pangram 3.2.
Outputs
The ai_assistance_score field in API results is still a number between 0 and 1, but will now be normalized to be more easily interpretable. The numbers can be interpreted as follows:
- +<=0.25: Human
- +Between 0.25 and 0.5: Lightly AI-assisted
- +Between 0.5 and 0.75: Moderately AI-assisted
- +>= 0.75: AI
Supported Languages
No change from Pangram 3.2.
Model Training
No change from Pangram 3.2.
Model Inference
No change from Pangram 3.2.
Evaluations
Methodology
- +
In-domain test set evaluation — held-out examples from the same datasets used for training, useful for understanding model fit.
- +
Out-of-domain evaluation — completely held-out sources and domains, useful for understanding generalization.
- +
External benchmarks — useful for field comparison, but should not be trusted as a current measure once released, as benchmarks can be trivially trained on.
Human Datasets — False Positive Rate
| Dataset | FPR | N |
|---|---|---|
| Academic Writing (English) | 0.02% | 62,971 |
| Academic Writing, Google Translated | 0.17% | 600 |
| Amazon Reviews (Multilingual) | 0.01% | 10,425 |
| News (Multilingual) | 0.03% | 100,199 |
| Creative Writing, long-form, English | 0.01% | 10,495 |
| Poetry | 0.49% | 12,769 |
| Biomedical Research Papers | 0.01% | 65,053 |
| How-To Articles, Multilingual | 0.04% | 166,194 |
| AWS Documentation | 0.01% | 11,652 |
| Speeches | 0.00% | 1,058 |
| Movie Scripts | 0.00% | 9,989 |
| Recipes | 0.02% | 22,421 |
AI Datasets by Domain — False Negative Rate
| Dataset | FNR | N |
|---|---|---|
| Academic Writing | 0.00% | 48,443 |
| Creative Writing | 0.23% | 41,940 |
| Chatbot Arena, Random Sample | 1.50% | 2,536 |
3rd Party Benchmarks
| Dataset | Accuracy | FPR | FNR | N |
|---|---|---|---|---|
| Liang et al. (2023): Nonnative English | 100% | 0% | — | 91 |
| Russell et al. (2024): Human Detectors | 100% | 0% | 0% | 300 |
| Dugan et al. (2024): RAID, Random Sample | 99.26% | 0.01% | 0.93% | 66,855 |
Changes from Pangram 3.2
- +Increased recall on the latest LLMs (Claude 4.7, GPT 5.4+).
- +Increased recall on humanized texts.
- +Increased recall on long-form (2,000+ word) AI-generated content. Long AI documents that Pangram 3.2 occasionally classified as "mixed, with later segments coming back as human, are more consistently classified as fully AI.
- +Reduced false positive rate on non-native English writing.
- +No degradation in the overall false positive rate.
Intended Usage and Limitation
As with the previous Pangram version, Pangram 3.3 is intended to be used on long-form writing samples in complete sentences.
Bullet point lists, instructions and technical manuals, tables of contents, reference sections, templated or automated writing, and dense mathematical equations are more susceptible to false positives than other domains.
For best results, human-written instructions, headers, footers, and other extraneous formatting should be removed before checking for AI-generated text.
Raw text and .docx files are recommended input file formats over PDFs when available, as PDF parsing may introduce unintended artifacts to the text.
Ethics & Safety
False accusations of AI usage can lead to serious consequences, including reputational damage, emotional trauma, and other undue harm.
We acknowledge that our model has a non-zero error rate and its errors may result in such harms. We commit to continuing to engage with our users and the academic community to educate others on appropriately contextualizing and communicating the results of AI detection software.
We take reports of false positives extremely seriously and work to mitigate their occurrences to the best of our team's ability.
We commit to training and releasing models with the lowest possible false positive rate, and improve our evaluations to continue to thoroughly test and monitor future model releases for regressions.
Pangram Labs — Model Card for Pangram 3.3
