Pangram detects Gemini 3! Learn more

How does AI detection work?

Alex Roitman

October 9th

Key Takeaways:

LLMs display certain patterns that allow AI detection tools to identify whether it is human or AI-generated.
Old detectors relied on burstiness and perplexity, but these are unreliable. New detectors work much better and rely on larger data sets and active learning.
When choosing a detector, users should determine what rates for false positives and false negatives they are comfortable with. They should also determine if they need a plagiarism checker or other features before selecting a tool.
Humans that haven’t been trained in AI-detection aren’t very good at it. However, those that have or are routinely exposed to AI-generated text can identify AI content better than an untrained one. This detection combined with software can work very well.

What are AI detectors?

AI can be detected. It's not black magic, but something that has a lot of research behind it. In an era where AI-generated content is increasingly prevalent in academia, media, and business, the ability to distinguish between human and machine-authored text is a critical skill. AI makes linguistic, stylistic, and semantic choices that can all be detected by a trained eye or sufficiently good automated detection software. This is because we can understand why AI talks the way it does, and what the patterns are that we can use in order to spot it.

How do Large Language Models Work?

Before we talk about how AI detection software works. It's important to understand that artificial intelligence models are probability distributions. A Large Language Model (LLM) like ChatGPT, is a very, very complicated version of this, constantly predicting the next most probable word or "token" in a sequence. These probability distributions are learned from a massive amount of data, often encompassing a significant portion of the public internet.

One thing that is often heard is, "Are AI language models the average of all human writing?" The answer is clearly no. Language models are not taking just the average of things that all humans say. For one, language models make highly idiosyncratic choices. They are also highly biased due to training datasets and the biases of their creators. Finally, modern LLMs are optimized to follow instructions and say things that users want to hear, rather than for correctness or accuracy, a trait that makes them useful assistants but unreliable sources of truth.

This is a result of the modern LLM training procedure, which has three stages:

Pre-training: During this phase of training, the model learns the statistical patterns of language. Biases from the training data show up in these patterns. For example, data that frequently appears on the Internet, like Wikipedia, is overrepresented, which is why AI-generated text often has a formal, encyclopedic tone. Additionally, cheap, outsourced labor is used to create training data, which is how words like "delve," "tapestry," and "intricate" become extremely commonplace in AI-generated text, reflecting the linguistic norms of the data creators rather than the end-user.
Instruction Tuning: During this phase, the model learns to follow instructions and obey orders. The consequence is that the model learns that it is better to follow instructions than it is to present accurate, correct information. This results in a sycophantic or "people-pleasing" behavior, where the AI prioritizes generating a helpful and agreeable-sounding response, even if it has to invent facts or "hallucinate" to do so. It is more important to them to be people-pleasing than it is to get the information right.
Alignment: During this phase, the model learns how to say what people like and prefer. It learns what are "good" and "bad" things to say. However, this preference data can be extremely biased, often favoring responses that are neutral, safe, and inoffensive. This process can strip the model of a distinct voice, leading it to avoid controversy or strong opinions. The LLM has no underlying groundedness in truth or correctness.

Generative AI models are products, released by technology companies, that intentionally inject biases and behaviors that are reflected in their outputs.

What are the Patterns in AI Language?

Once you understand how LLMs are trained, you can spot the "tells" of AI writing tools. It's often not one smoking gun, but a combination of all of these words that sets off detectors.

Language and Style

Word Choices: AI content writers have favorite words, such as: aspect, challenges, delve, enhance, tapestry, testament, authentic, comprehensive, crucial, significant, transformative, and adverbs like additionally and moreover. This happens because of bias in pretraining datasets. The frequent use of these words can create a tone that is excessively formal or grandiose, often feeling out of place in the context of a typical student essay or informal communication.
Phrasing Patterns: AI writing uses phrasing patterns like as we [verb] the topic, it's important to note, not only but also, paving the way, and when it comes to. These phrases, while grammatically correct, are often used as conversational filler and can make the writing feel generic and formulaic.
Spelling and Grammar: AI writing generally uses perfect spelling and grammar, and likes to use complex sentences. Human writing uses a mix of simple and complex sentences, and even expert-level writers will sometimes use grammatical patterns that aren't "by the book perfect" for stylistic reasons, such as using sentence fragments for emphasis.

Structure and Organization

Paragraphs and Sentence Structure: AI writing generally likes to use very organized paragraphs that are all about the same length and list-like structures. This can result in a monotonous rhythm that lacks the natural variation of human writing. This can also apply to sentence length.
Introductions and Conclusions: AI-generated essays usually have a very neat introduction and conclusion, and the conclusion is often very long, starts with "Overall," or "In Conclusion," and repeats most of what was already written, essentially rephrasing the thesis and main points without adding new insight or synthesis.

Purpose and Personality

Purpose and Intent: The writing is usually very vague and full of generalities. This happens because instruction tuning over-prioritizes prompt adherence, and in order to stay on topic, the model learns that it's best to be really vague and generic to minimize the risk of being incorrect.
Reflection and Metacognition: AI is very bad at reflecting and relating the writing to personal experiences... because it has no personal experiences that it can relate to! Human writing can show the unique voice and personal experience of its author, making connections and generating novel ideas that are not simply a remix of existing information.
Abrupt Shifts in Style and Tone: Sometimes there is a very jarring and abrupt shift in tone and style. This happens when a student is using AI for some of their writing, but not all of it, creating an inconsistent and disjointed final product.

How AI Detection works: Three Steps

Train the AI detection model: First, the model is trained. Early AI detectors didn’t work very well because they tried to detect pieces of text with high perplexity or high burstiness. Perplexity is how unexpected, or surprising, each word in a piece of text is to a large language model. Burstiness is the change in perplexity over the course of a document. However, this approach has several flaws, and often fails to detect AI outputs. They also only use a limited dataset of text to train on. Modern and successful models like Pangram, use a wider set of data and employ techniques like active learning to drive more accurate results.
Input the text that needs to be classified & tokenize it: A user gives the input. When a classifier receives input text, it tokenizes it. That means it takes all the text and breaks it down into a series of numbers that the model can make sense of. The model then turns each token into an embedding, which is a vector of numbers representing the meaning of each token.
Classify the token human or AI: the input is passed through the neural network, producing an output embedding. A classifier head transforms the output embedding into a 0 or 1 prediction, where 0 is the human label and 1 is the AI label.

What about False Positives & False Negatives?

An AI detection tool value is measured by how many false positives (FPRs) and false negatives (NPRs) result from using the tool. A false positive is when a detector mistakenly predicts a human-written content sample as being AI-written. In contrast, a false negative is when an AI-generated sample is mispredicted as being human-written text.

Human vs. Automated Detection

If you choose to detect AI by eye alone, you must be trained. Non-experts cannot do better than random guessing. Even advanced linguists cannot detect AI without explicit training. Our recommendation is to use both methods for the best results, creating a more robust and fair evaluation process.

While, AI content detectors can tell you whether or not something was generated by AI tools. Humans can tell you not just whether or not something was AI, but they can add additional context and nuance into that decision. A human knows the context: previous student writing samples, what grade-level writing looks like, and what a typical assignment response from a student looks like. This context is critical, as the appropriateness of AI use can vary dramatically depending on the assignment's instructions.

AI detection tools are just the beginning. It is not conclusive evidence that a student has violated academic integrity, but rather an initial data point that warrants further, contextual investigation. AI use may be inadvertent, accidental, or even allowed within the scope of your particular assignment: it depends!

Bonus: What about Humanizers?!

Humanizers are tools that are used to “humanize” AI content to avoid AI detection. Content writers often use them to change the way AI writing looks. Humanizers paraphrase text, remove specific words, and add humanlike “mistakes” to a piece of content. Sometimes it will make the text basically unreadable or lower the quality significantly. Many AI detectors train their software to detect humanized text. It’s often a risk to use a humanizer because it can dramatically lower the quality of text, which is particularly concerning for student work.

Now that you know how they work, try your own content. Is it AI or Human?

Subscribe to our newsletter

We share monthly updates on our AI detection research.

Subscribe
to our updates

Stay informed with our latest news and offers.

Pangram.

soc2

SOC2 TYPE2

Verified by AssuranceLab

© 2025 Pangram. All rights reserved.

info@pangram.com

Join our Community

© 2025 Pangram. All rights reserved.