model
Pangram 3.2
February 27, 2026Katherine Thai · Elyas Masrour · Max Spero · Bradley Emi
Model Description
Pangram 3.2 is an AI detection model released by Pangram Labs on February 25, 2026. It is the successor to Pangram 3.1, with incremental improvements on recall at the same state-of-the-art false positive rate as its predecessor.
Pangram 3.2 is also better able to classify shorter texts of 50-75 words than Pangram 3.1.
As with all Pangram 3 series models, it uses technology from the EditLens technical report.
Inputs & Outputs
Inputs
Pangram 3.2 can receive text inputs of 50 words minimum up to 75,000 characters in length. The minimum of 50 words is lower than the previous minimum of 75 words for Pangram 3.1.
Segment Level Outputs
Pangram 3.2 emits both document level and segment level predictions. The model splits input text into an array of segmentas, which are individual chunks or windows of text within the document. Each segment returns one of the following predictions:
One of: AI-Generated, Moderately AI-Assisted, Lightly AI-Assisted, or Human-Written.
A score between 0.0 and 1.0 indicating the level of AI involvement, where 0.0 means no AI assistance and 1.0 means fully AI-generated.
The model's confidence level for this classification (High, Medium, or Low).
Document Level Outputs
These are all possible classification results that our AI detection model can return when analyzing a document.
Fully Human Written
The entire document is identified as human-written, with no AI involvement detected.
Human Written
The vast majority of the document (90% or more) is identified as human-written. A small portion appears to have been refined or polished with the help of AI tools, but no directly AI-generated content is detected.
Mostly Human Written
The vast majority of the document (90% or more) is identified as human-written, but a small amount of AI-generated content is detected.
Fully AI Generated
The entire document is identified as AI-generated, with no human-written content detected.
AI Assisted
This result is returned in several scenarios where the document shows signs of AI assistance rather than direct AI generation. These include documents that are entirely composed of lightly AI-assisted content, entirely composed of moderately AI-assisted content, or a combination of different levels of AI assistance. It also applies when the document contains a mix of AI-assisted and human-written content without any directly AI-generated passages, or when the document is predominantly AI-assisted with some human writing present.
AI Detected
This result indicates that directly AI-generated content has been identified in the document. It applies across a range of scenarios, including documents that are predominantly AI-generated with some human-written or AI-assisted content, documents that are mostly human-written but contain some AI-generated passages, and documents that contain a blend of AI-generated, AI-assisted, and human-written content in varying proportions.
Mostly Human, AI Detected
The document is primarily human-written (at least 70%), but some AI-generated content is detected. In some cases, AI-assisted content may also be present alongside the AI-generated portions.
Mostly Human, AI Assisted
The document is primarily human-written (at least 70%), and while no directly AI-generated content is found, some portions appear to have been created with the assistance of AI tools.
Model Resolution
Pangram 3.2 is accurate to a resolution of approximately 50 words. This means that shorter segments of human text interspersed with AI text may be classified as AI assisted, but Pangram will not be able to distinguish the human and AI parts at the word- or sentence-level.
Supported Languages
Pangram can work effectively on languages outside of the official support set due to generalization from the LLM backbone. Please contact us to request an evaluation on other languages outside the official support set.
Model Training
Training Datasets
No change from Pangram 3.1.
Architecture
Pangram 3.2 has the same architecture as Pangram 3.1, except we have reduced the context window to 512 tokens. The tokenizer is a standard multilingual tokenizer.
We do not trade off false positives and false negatives. We calibrate our model such that the false positive rate is equivalent to that of our previous release.
Hardware Trained for 3 days on 8 NVIDIA H100 GPUs.
Model Inference
Preprocessing
No change from Pangram 3.1.
Inference
Pangram 3.2 uses the Adaptive Boundaries algorithm described in our Pangram 3.1 model card.
Postprocessing
No change from 3.1.
Inference Latency
No change from 3.1.
Evaluations
Metodologia
- +
In-domain test set evaluation — held-out examples from the same datasets used for training, useful for understanding model fit.
- +
Out-of-domain evaluation — completely held-out sources and domains, useful for understanding generalization.
- +
External benchmarks — useful for field comparison, but should not be trusted as a current measure once released, as benchmarks can be trivially trained on.
Human Datasets — False Positive Rate
| Set di dati | FPR | N |
|---|---|---|
| Academic Writing (English) | 0.02% | 62,971 |
| Academic Writing, Google Translated | 0.00% | 600 |
| Amazon Reviews (Multilingual) | 0.00% | 10,425 |
| News (Multilingual) | 0.35% | 100,119 |
| Creative Writing, long-form, English | 0.00% | 10,495 |
| Poetry | 0.54% | 12,769 |
| Biomedical Research Papers | 0.01% | 65,053 |
| How-To Articles, Multilingual | 0.17% | 166,194 |
| AWS Documentation | 0.00% | 11,652 |
| Speeches | 0.19% | 1,058 |
| Movie Scripts | 0.00% | 9,989 |
| Ricette | 0.05% | 22,421 |
AI Datasets by Domain — False Negative Rate
| Set di dati | FNR | N |
|---|---|---|
| Academic Writing | 0.00% | 48,443 |
| Creative Writing | 0.24% | 41,940 |
| Chatbot Arena, Random Sample | 1.98% | 2,536 |
3rd Party Benchmarks
| Set di dati | Precisione | FPR | FNR | N |
|---|---|---|---|---|
| Liang et al. (2023): Nonnative English | 100% | 0% | — | 91 |
| Russell et al. (2024): Human Detectors | 100% | 0% | 0% | 300 |
| Dugan et al. (2024): RAID, Random Sample | 99.43% | 0.07% | 0.90% | 66,855 |
Changes from Pangram 3.1
- +Updated the model to detect Claude 4.6.
- +Improved recall, particularly on humanized texts.
- +Lowered minimum word count from 75 to 50.
Intended Usage and Limitation
Pangram 3.2 is intended to be used on long-form writing samples in complete sentences.
Bullet point lists, instructions and technical manuals, tables of contents, reference sections, templated or automated writing, and dense mathematical equations are more susceptible to false positives than other domains.
For best results, human-written instructions, headers, footers, and other extraneous formatting should be removed before checking for AI-generated text.
Due to potential errors in the PDF parser, raw text and .docx files are recommended input file formats over PDFs when available.
Ethics & Safety
False accusations of AI usage can lead to serious consequences, including reputational damage, emotional trauma, and other undue harm.
We acknowledge that our model has a non-zero error rate and its errors may result in such harms. We commit to continuing to engage with our users and the academic community to educate others on appropriately contextualizing and communicating the results of AI detection software.
We take reports of false positives extremely seriously and work to mitigate their occurrences to the best of our team's ability.
We commit to training and releasing models with the lowest possible false positive rate, and improve our evaluations to continue to thoroughly test and monitor future model releases for regressions.
Pangram Labs — Model Card for Pangram 3.2
