Pangram detects Gemini 3! Learn more

How well does Pangram work on AI code?

Bradley Emi

October 7, 2025

More and more code is being written with AI every day. According to Sundar Pichai, CEO of Google, over 25% of Google's code was written by AI as of late 2024. Robinhood's CEO says that most of the code shipped at Robinhood is now written by AI. The term "vibe coding" (popularized in a tweet by Andrej Karpathy) has come into the public lexicon: meaning when you fully give into the "vibes" of coding and let AI take the wheel and write the code for you.

Startups such as Cursor, Lovable, and Replit are trying to remove the entry barrier to coding: meaning getting into programming is so easy that anyone at the company can produce code, or even a make full blown website or app without any knowledge of Python or React.

The 2025 StackOverflow Developer Survey reveals just how widespread this trend has become. 84% of developers are using or planning to use AI tools in their development workflow, with 51% of professional developers using AI tools daily. This represents a significant shift in how code is being written across the industry.

However, the survey also reveals growing pains in this AI-assisted development era. While 52% of developers report that AI tools have positively impacted their productivity, positive sentiment toward AI tools has dropped from 70%+ to 60% in 2025. After an initial honeymoon period of exploration with these AI-generated tools, it appears that developers feel more neutrally towards them now.

The source of frustration is telling: 66% of developers are frustrated by "AI solutions that are almost right, but not quite" and 45% find that debugging AI-generated code is more time-consuming than expected. Only 3% of developers "highly trust" AI tool output, with 46% actively distrusting AI tool accuracy.

This creates an interesting paradox: developers are increasingly relying on AI to write code, but they don't fully trust what it produces. As the survey notes, 75% of developers would still ask a human for help when they "don't trust AI's answers," positioning themselves as the "ultimate arbiters of quality and correctness." According to Simon Willison, he "wouldn't use AI-generated code for projects he planned to ship out unless he had reviewed each line. Not only is there the risk of hallucination but the chatbot's desire to be agreeable means it may say an unusable idea works. That is a particular issue for those of us who don't know how to edit the code. We risk creating software with inbuilt problems."

The importance of detecting AI-generated code

While AI-generated code is here to stay, there are definitely some places where it still makes sense to verify that code is human written.

In the hiring process, when hiring a software developer, it is important to evaluate that the programmer is fully capable of writing high quality code without the assistance of AI. Additionally, it is also important to assess their understanding of the code so that they can successfully debug and diagnose faulty AI-generated or AI-assisted code in their job.
In education, it is important to teach students how to program without AI assistance. With too much AI assistance, students can miss fundamental concepts and bypass learning the skills that they need in order to be successful software engineers. Although it is likely that these students will have access to AI assistance during their jobs, as alluded to by the StackOverflow developer survey, without a solid foundation, students will not be able to fix incorrect AI-generated code or even be able to understand what is wrong in the first place.
Compliance and security. Many compliance frameworks consider AI-generated code to be higher risk due to potential hallucinations and bugs. There are also important licensing and copyright considerations - AI models may inadvertently reproduce code with incompatible licenses, leading to compliance violations. Additionally, there are open questions around whether AI-generated code can be considered proprietary or copyrightable.
Provenance and code tracking. Before AI, tools like git blame made it easy to track who wrote each line of code and why changes were made. With AI generating large amounts of code, it becomes more difficult for developers to remember the context and reasoning behind every line. Being able to detect and track AI-generated code helps with code maintenance, debugging, and resource management. CTOs and engineering leaders can use this information to evaluate the effectiveness of different AI models and ensure their teams are using the best tools available.

Pangram's Ability to Detect AI-generated Code

Overall, Pangram is able to conservatively detect most AI-generated code, especially when the code is over 40 lines long. Pangram is conservative because it rarely flags human-written code as AI-generated, but misses about 8% of AI-generated code, falsely predicting it as human.

When looking at all code snippets, Pangram misses about 20% of AI-generated code, because most short AI code snippets are boilerplate that is indistinguishable from human code or just do not have enough signal to be detected.

Accuracy on code Over 40 lines long

Metric	Score
Accuracy	96.2% (22,128/22,997)
False Positive Rate	0.3% (39/13,178)
False Negative Rate	8.5% (830/9,819)

Accuracy on all code snippets

Metric	Score
Accuracy	89.4% (41,395/46,319)
False Positive Rate	0.4% (99/25,652)
False Negative Rate	23.3% (4,825/20,667)

Dataset

We use the GitHub dataset to perform this analysis. For the AI code, we use a simple two-stage synthetic mirroring stage:

Ask the LLM to provide a short summary of what the code is about.
Ask the LLM to write a code sample according to the returned summary.

We use GPT-4o, Claude Sonnet, Llama 405b, Mistral 7B, Gemini 1.5 Flash, and Gemini 1.5 Pro to create the dataset.

Recommendations for Detecting AI-Generated Code

AI-generated code is more difficult to detect than AI-generated writing because there are significantly less degrees of freedom: there are less arbitrary stylistic choices that a programmer makes as compared to a writer. We notice in the false negatives that we observe, many files simply do not have much room for creativity or flexibility, such as boilerplate auto-generated code or configuration files. Low-level languages, such as C, Assembly, and compiler code, also are much more strict in their syntax and so there are less signals to be able to tell when code is AI-generated.

If you are looking for signs of AI-generated code, we recommend the following:

Comments: often AI-generated code has a very specific way of writing comments. We also notice AI-generated code writes many more comments in the code than a human does normally.
Internal similarity: AI-generated code is often similar to other AI-generated code, especially for an individual assignment in a programming class. MOSS, the Measure of Software Similarity, developed at Stanford, is available for non-commercial use, and is effective at detecting code similarity and can often pick up many similar-looking AI-generated coding assignments.
Pangram is able to catch a large swath of AI-generated code without false positives, but false negatives are common. Pangram can be trusted as a screening tool to initially catch some, but not all, AI-generated code plagiarism.

Subscribe to our newsletter

We share monthly updates on our AI detection research.

Subscribe
to our updates

Stay informed with our latest news and offers.

Pangram.

soc2

SOC2 TYPE2

Verified by AssuranceLab

© 2025 Pangram. All rights reserved.

info@pangram.com

Join our Community

© 2025 Pangram. All rights reserved.