New

Instantly know what's human and AI on Twitter, LinkedIn, Substack and more. Get our new Chrome extension.

model

Pangram 3.1

Janurary 16, 2026

Bradley Emi · Katherine Thai · Elyas Masrour · Max Spero

We launched this model during the snowy weather in Brooklyn, NY

We launched this model during the snowy weather in Brooklyn, NY

Model Description

Pangram 3.1 is an AI detection model released by Pangram Labs on January 16, 2026. It is the successor to Pangram 3.0, which was the first Pangram AI detection model with the capability to classify homogeneous mixed text, using technology from the EditLens technical report.

Pangram 3.1 is a transformer-style architecture adapted for sequence classification trained using deep learning on a variety of text domains.

Inputs & Outputs

Inputs

Pangram 3.1 can receive text inputs of 75 words minimum up to 75,000 characters in length.

Outputs

Pangram 3.1 splits the input text into an array of windows. Each window returns the following predictions:

label (Discrete classification)

One of: AI-Generated, Moderately AI-Assisted, Lightly AI-Assisted, or Human-Written.

ai_assistance_score (Float)

A score between 0.0 and 1.0 indicating the level of AI involvement, where 0.0 means no AI assistance and 1.0 means fully AI-generated.

confidence (Discrete classification)

The model's confidence level for this classification (High, Medium, or Low).

Model Resolution

Pangram is accurate to a resolution of approximately 75 words. This means that shorter segments of human text interspersed with AI text may be classified as AI assisted, but Pangram will not be able to distinguish the human and AI parts at the word- or sentence-level.

Supported Languages

English Spanish French Portuguese Arabic Chinese Japanese Korean Norwegian Russian Turkish Hungarian German Dutch Swedish Romanian Ukrainian Polish Italian Czech Greek Hindi

Pangram can work effectively on languages outside of the official support set due to generalization from the LLM backbone. Please contact us to request an evaluation on other languages outside the official support set.

Model Training

Training Datasets

Pangram's human written corpus is primarily composed of long-form prose from a wide and balanced variety of domains. These domains include: essays, creative writing, reviews, books, Wikipedia articles, news articles, scientific papers, and general web text. All data is either owned by Pangram, licensed by Pangram under private agreements, or licensed openly for commercial use.

Pangram's AI corpus is generated fully in-house using synthetic mirroring techniques. AI-edited data is also generated using the EditLens methodology.

Architecture

Pangram 3.1 has a context window of 1024 tokens. The tokenizer is a standard multilingual tokenizer.

The architecture is a modern decoder-only transformer with a classification head attached. The classification head emits K logits, each corresponding to a different level of AI pervasiveness present within the text.

These K logits are then decoded into the four categories: "Human-Written", "Lightly AI Assisted", "Moderately AI Assisted" and "Fully AI-Generated". The model is trained using QLoRA, targeting all layers.

Pangram employs both loss-weighting and calibration such that a false positive is 5x less likely to occur than a false negative.

Hardware Trained for 3 days on 8 NVIDIA H100 GPUs.

Software Trained using the PyTorch and HuggingFace libraries.

Model Inference

Preprocessing

Pangram is whitespace and case-insensitive. Characters are converted to a standard format before tokenization. Augmentations such as paraphrasing, translation, and cropping are applied before training.

When files are input to Pangram, PyMuPDF is used for PDF parsing.

Inference

On documents longer than 1024 tokens, Pangram uses an algorithm called Adaptive Boundaries to segment the document into windows, and compute an AI likelihood per window.

Adaptive Boundaries runs a two-pass, sentence-aware pipeline: it first builds static-sized overlapping windows over the text and gets AI-likelihood scores for each window. Then, it marks regions that are uncertain or where adjacent windows flip labels, and constructs finer-grained second-pass windows around those areas to predict the boundaries between AI and human text.

It then aggregates window scores onto sentences and clusters consecutive sentences with similar probabilities into coherent spans, enforcing variance and token-length limits. The endpoint returns these spans with ai_likelihood, label/confidence, and character indices.

Postprocessing

Postprocessing includes a hysteresis and non-maximal suppression step to combine windows with similar predictions and also remove outliers.

Logits are also adjusted to bias the model towards avoiding false positives.

Inference Latency

300-500ms expected on the API.

Evaluations

Methodology

+
In-domain test set evaluation — held-out examples from the same datasets used for training, useful for understanding model fit.
+
Out-of-domain evaluation — completely held-out sources and domains, useful for understanding generalization.
+
External benchmarks — useful for field comparison, but should not be trusted as a current measure once released, as benchmarks can be trivially trained on.

Human Datasets — False Positive Rate

Dataset	FPR	N
Academic Writing (English)	0.03%	62,971
Academic Writing, Google Translated	0.17%	600
Amazon Reviews (Multilingual)	0.00%	10,425
News (Multilingual)	0.07%	100,119
Creative Writing, long-form, English	0.00%	10,495
Poetry	3.50%	12,769
Biomedical Research Papers	0.05%	65,053
How-To Articles, Multilingual	0.16%	166,194
AWS Documentation	0.00%	11,652
Speeches	0.00%	1,058
Movie Scripts	0.00%	9,989
Recipes	0.10%	22,421

AI Datasets by Domain — False Negative Rate

Dataset	FNR	N
Academic Writing	0.00%	48,443
Creative Writing	0.20%	41,940
Chatbot Arena, Random Sample	2.80%	2,536

3rd Party Benchmarks

Dataset	Accuracy	FPR	FNR	N
Liang et al. (2023): Nonnative English	100%	0%	—	91
Russell et al. (2024): Human Detectors	100%	0%	0%	300
Dugan et al. (2024): RAID	99.44%	0.05%	0.69%	66,855

AI Assistance Score

Fully Human-Written

The document is either fully written by a human, or there is extremely minor AI assistance such that the strong majority of the document would be considered to be human-authored.

Lightly AI Assisted

Light AI assistance typically indicates surface-level changes that do not affect the underlying ideas, structure, or content of the text. Light assistance includes spelling and grammar fixes, updated phrasing, translation, and readability changes.

Moderately AI Assisted

Moderate AI assistance typically indicates changes where AI may have rewritten significant portions of the text or added content of its own. Moderate assistance includes adding additional details or clarifications, making tone adjustments, restructuring text, or rewriting the text in a different style or tone.

Fully AI Generated

Text classified as fully AI-generated is typically straight from an AI model like ChatGPT, but may also be originally human but significantly rewritten or assisted by AI such that AI is the primary author. This category also includes text that is primarily AI-generated, or text that was initially generated by AI.

Changes from Pangram 3.0

+Performance on creative writing is significantly improved
+The sensitivity of the "Lightly AI-Assisted" label is decreased: fewer texts will flag as Lightly AI-Assisted, more AI assistance is required to trigger the Lightly AI Assisted class
+Performance on multilingual documents is improved
+Performance on Gemini 3, GPT 5.2, and Opus 4.5 is improved
+Performance on AI text translated by AI models is significantly improved

Intended Usage and Limitation

Pangram 3.1 is intended to be used on long-form writing samples in complete sentences.

Bullet point lists, instructions and technical manuals, tables of contents, reference sections, templated or automated writing, and dense mathematical equations are more susceptible to false positives than other domains.

For best results, human-written instructions, headers, footers, and other extraneous formatting should be removed before checking for AI-generated text.

Due to potential errors in the PDF parser, raw text and .docx files are recommended input file formats over PDFs when available.

Ethics & Safety

False accusations of AI usage can lead to serious consequences, including reputational damage, emotional trauma, and other undue harm.

We acknowledge that our model has a non-zero error rate and its errors may result in such harms. We commit to continuing to engage with our users and the academic community to educate others on appropriately contextualizing and communicating the results of AI detection software.

We take reports of false positives extremely seriously and work to mitigate their occurrences to the best of our team's ability.

We commit to training and releasing models with the lowest possible false positive rate, and improve our evaluations to continue to thoroughly test and monitor future model releases for regressions.

Pangram Labs — Model Card for Pangram 3.1

Subscribe
to our updates

Stay informed with our latest news and offers.

soc2

SOC2 TYPE2

Verified by AssuranceLab

© 2025 Pangram. All rights reserved.

info@pangram.com

Join our Community

© 2025 Pangram. All rights reserved.