model
Pangram 3.1
Janurary 16, 2026Bradley Emi · Katherine Thai · Elyas Masrour · Max Spero

We launched this model during the snowy weather in Brooklyn, NY
Model Description
Pangram 3.1 is an AI detection model released by Pangram Labs on January 16, 2026. It is the successor to Pangram 3.0, which was the first Pangram AI detection model with the capability to classify homogeneous mixed text, using technology from the EditLens technical report.
Pangram 3.1 is a transformer-style architecture adapted for sequence classification trained using deep learning on a variety of text domains.
Inputs & Outputs
Inputs
Pangram 3.1 can receive text inputs of 75 words minimum up to 75,000 characters in length.
Outputs
Pangram 3.1 splits the input text into an array of windows. Each window returns the following predictions:
One of: AI-Generated, Moderately AI-Assisted, Lightly AI-Assisted, or Human-Written.
A score between 0.0 and 1.0 indicating the level of AI involvement, where 0.0 means no AI assistance and 1.0 means fully AI-generated.
The model's confidence level for this classification (High, Medium, or Low).
Model Resolution
Pangram is accurate to a resolution of approximately 75 words. This means that shorter segments of human text interspersed with AI text may be classified as AI assisted, but Pangram will not be able to distinguish the human and AI parts at the word- or sentence-level.
Supported Languages
Pangram can work effectively on languages outside of the official support set due to generalization from the LLM backbone. Please contact us to request an evaluation on other languages outside the official support set.
Model Training
Training Datasets
Pangram's human written corpus is primarily composed of long-form prose from a wide and balanced variety of domains. These domains include: essays, creative writing, reviews, books, Wikipedia articles, news articles, scientific papers, and general web text. All data is either owned by Pangram, licensed by Pangram under private agreements, or licensed openly for commercial use.
Pangram's AI corpus is generated fully in-house using synthetic mirroring techniques. AI-edited data is also generated using the EditLens methodology.
Architecture
Pangram 3.1 has a context window of 1028 tokens. The tokenizer is a standard multilingual tokenizer.
The architecture is a modern decoder-only transformer with a classification head attached. The classification head emits K logits, each corresponding to a different level of AI pervasiveness present within the text.
These K logits are then decoded into the four categories: "Human-Written", "Lightly AI Assisted", "Moderately AI Assisted" and "Fully AI-Generated". The model is trained using QLoRA, targeting all layers.
Pangram employs both loss-weighting and calibration such that a false positive is 5x less likely to occur than a false negative.
Model Inference
Preprocessing
Pangram is whitespace and case-insensitive. Characters are converted to a standard format before tokenization. Augmentations such as paraphrasing, translation, and cropping are applied before training.
When files are input to Pangram, PyMuPDF is used for PDF parsing.
Inference
On documents longer than 1028 tokens, Pangram uses an algorithm called Adaptive Boundaries to segment the document into windows, and compute an AI likelihood per window.
Adaptive Boundaries runs a two-pass, sentence-aware pipeline: it first builds static-sized overlapping windows over the text and gets AI-likelihood scores for each window. Then, it marks regions that are uncertain or where adjacent windows flip labels, and constructs finer-grained second-pass windows around those areas to predict the boundaries between AI and human text.
It then aggregates window scores onto sentences and clusters consecutive sentences with similar probabilities into coherent spans, enforcing variance and token-length limits. The endpoint returns these spans with ai_likelihood, label/confidence, and character indices.
Postprocessing
Postprocessing includes a hysteresis and non-maximal suppression step to combine windows with similar predictions and also remove outliers.
Logits are also adjusted to bias the model towards avoiding false positives.
Inference Latency
Evaluations
方法论
- +
In-domain test set evaluation — held-out examples from the same datasets used for training, useful for understanding model fit.
- +
Out-of-domain evaluation — completely held-out sources and domains, useful for understanding generalization.
- +
External benchmarks — useful for field comparison, but should not be trusted as a current measure once released, as benchmarks can be trivially trained on.
Human Datasets — False Positive Rate
| 数据集 | FPR | N |
|---|---|---|
| Academic Writing (English) | 0.01% | 62,971 |
| Academic Writing, Google Translated | 0.17% | 600 |
| Amazon Reviews (Multilingual) | 0.00% | 10,425 |
| News (Multilingual) | 0.06% | 100,119 |
| Creative Writing, long-form, English | 0.00% | 10,495 |
| Poetry | 0.52% | 12,769 |
| Biomedical Research Papers | 0.05% | 65,053 |
| How-To Articles, Multilingual | 0.03% | 166,194 |
| AWS Documentation | 0.00% | 11,652 |
| Speeches | 0.00% | 1,058 |
| Movie Scripts | 0.00% | 9,989 |
| 食谱 | 0.08% | 22,421 |
AI Datasets by Domain — False Negative Rate
| 数据集 | FNR | N |
|---|---|---|
| Academic Writing | 0.00% | 48,443 |
| Creative Writing | 0.20% | 41,940 |
| Chatbot Arena, Random Sample | 2.8% | 2,536 |
3rd Party Benchmarks
| 数据集 | 准确性 | FPR | FNR | N |
|---|---|---|---|---|
| Liang et al. (2023): Nonnative English | 100% | 0% | — | 91 |
| Russell et al. (2024): Human Detectors | 100% | 0% | 0% | 300 |
| Dugan et al. (2024): RAID | — | — | — | — |
| Jabarian and Imas (2025): UChicago Study | — | — | — | — |
AI Assistance Score
Fully Human-Written
The document is either fully written by a human, or there is extremely minor AI assistance such that the strong majority of the document would be considered to be human-authored.
Lightly AI Assisted
轻度AI辅助通常指表面层级的修改,不会影响文本的底层思想、结构或内容。轻度辅助包括拼写和语法修正、措辞更新、翻译以及可读性调整。
Moderately AI Assisted
Moderate AI assistance typically indicates changes where AI may have rewritten significant portions of the text or added content of its own. Moderate assistance includes adding additional details or clarifications, making tone adjustments, restructuring text, or rewriting the text in a different style or tone.
Fully AI Generated
Text classified as fully AI-generated is typically straight from an AI model like ChatGPT, but may also be originally human but significantly rewritten or assisted by AI such that AI is the primary author. This category also includes text that is primarily AI-generated, or text that was initially generated by AI.
Changes from Pangram 3.0
- +Performance on creative writing is significantly improved
- +The sensitivity of the "Lightly AI-Assisted" label is decreased: fewer texts will flag as Lightly AI-Assisted, more AI assistance is required to trigger the Lightly AI Assisted class
- +Performance on multilingual documents is improved
- +Performance on Gemini 3, GPT 5.2, and Opus 4.5 is improved
- +Performance on AI text translated by AI models is significantly improved
Intended Usage and Limitation
Pangram 3.1 is intended to be used on long-form writing samples in complete sentences.
Bullet point lists, instructions and technical manuals, tables of contents, reference sections, templated or automated writing, and dense mathematical equations are more susceptible to false positives than other domains.
For best results, human-written instructions, headers, footers, and other extraneous formatting should be removed before checking for AI-generated text.
Due to potential errors in the PDF parser, raw text and .docx files are recommended input file formats over PDFs when available.
Ethics & Safety
False accusations of AI usage can lead to serious consequences, including reputational damage, emotional trauma, and other undue harm.
We acknowledge that our model has a non-zero error rate and its errors may result in such harms. We commit to continuing to engage with our users and the academic community to educate others on appropriately contextualizing and communicating the results of AI detection software.
We take reports of false positives extremely seriously and work to mitigate their occurrences to the best of our team's ability.
We commit to training and releasing models with the lowest possible false positive rate, and improve our evaluations to continue to thoroughly test and monitor future model releases for regressions.
Pangram Labs — Model Card for Pangram 3.1
