Is Pangram useful?
March 30, 2026
It's useful for something, just not for detecting AI-generated text.
It's pretty easy to sneak AI-generated text past Pangram with enough effort: As long as you are willing to resample paragraphs until you get a "human-written" result, there is no reliable way to detect it. The largest models are capable of learning how to produce human-sounding text.1
Additionally, in exchange for not being able to detect text generated by anyone motivated enough, Pangram will also incorrectly flag snippets of Aquinas and entire Roon tweets as AI-generated 2-4% of the time for tweet-length passages, roughly on par with the 98% Overall TPR reported in Russell et al., 2025 (Table 2), and noticed by Reddit .
Interestingly, false positives seem to decrease for longer samples of text. This is good, but it's still trivial to obscure any number of AI-generated paragraphs by rephrasing them until they pass the detector, and then stitching them together.
Pangram claims this system produces false positives for one in ten thousand human-written passages, when errors are quite clearly far more common than that.
Should schools use it to detect cheating?
No, because if Aquinas had taken your class, after just a few dozen <60-word answers to test questions, he would have been flagged for cheating. Students' grades should not be affected by Pangram's judgment. Plus, a sufficiently motivated student has access to Pangram and can always work around it.
It's similar to "you won't always have a calculator in your pocket" from the 2000s. We all do have calculators in our pockets at all times now. You will have to physically observe students writing in order to guarantee they are writing it themselves.
Then what is it useful for?
Fingerprinting. Without regard for AI or not.
The idea is meant to be that if I put AI-generated text into it, I can reliably identify that it is not human-written. In practice, this is not that reliable of a process.
But what is reliable, is Pangram's ability to identify them as different distributions of text. So, although we can get plenty of simulated Aquinas past it, there is no way to obscure the fact that Thomas Aquinas clearly did not write those passages, and it is also obvious that the tweets which appear human-written were not written by Roon.
Roon
| Min Words | Mean | Median | |
|---|---|---|---|
| Actual | 20 | 0.0504 | 0.0087 |
| 40 | 0.0438 | 0.0071 | |
| Simulated | 20 | 0.1313 | 0.0666 |
| 40 | 0.1081 | 0.0300 |
Aquinas
| Min Words | Mean | Median | |
|---|---|---|---|
| Actual | 20 | 0.0707 | 0.0204 |
| 40 | 0.0892 | 0.0195 | |
| Simulated | 20 | 0.2949 | 0.0698 |
| 40 | 0.3078 | 0.0695 |
And Pangram is clearly the best system to tell you that: It seems to tell two voices from each other better than any other system. It purports to be able to identify "the LLM voice," which it basically can - but only if the output is zero-shot, and the model is not masking its own voice.
So, they should be selling to Palantir and the government, not to schools. And schools should not use this product for what it is sold as. But it is still valuable, as a forensics tool.
If we can use it to tell two authors apart, can't we still use it for school?
Not really, because you need a reference corpus of the student's verified human writing to compare against. Good luck getting an uncontaminated sample today.
Details
Aquinas corpus: English translations of Thomas Aquinas's Sententia libri Ethicorum (Commentary on the Nicomachean Ethics) and related commentaries. 1,030 unique samples tested, ~59,232 words (avg 58 words/sample). Chosen for rigid structure, verbose technical reasoning, and low perplexity (LLM-sounding).
Roon corpus: Top tweets by @tszzl (Jul 2022 – Jan 2023), sourced from the GPTweets dataset. Filtered to exclude replies and link-only tweets. Chosen for brevity, high perplexity, creative density.
Model: Claude Opus 4.6 with thinking, multi-turn conversation with iterative score feedback.
Sampling: Entire sentences are taken until the minimum word count is reached. This was done to avoid inflating the false positive rate by submitting half-sentences.
False Positive Rate on Human Text
Rate at which genuine human writing is incorrectly flagged as AI-generated. Lower FPR is better.
| Source | Min Words | Avg Length | n | FPR% (n) | Mean Score | Median Score | Range |
|---|---|---|---|---|---|---|---|
| aquinas | 20 | 36 words | 271 | 4.8% (13) | 0.0987 | 0.0171 | 0.0040 – 0.9232 |
| 40 | 57 words | 254 | 2.0% (5) | 0.0840 | 0.0195 | 0.0045 – 0.8049 | |
| 50 | 66 words | 235 | 3.0% (7) | 0.1088 | 0.0223 | 0.0046 – 0.8056 | |
| 60 | 76 words | 216 | 1.9% (4) | 0.0981 | 0.0235 | 0.0042 – 0.6412 | |
| 80 | 94 words | 150 | 0.7% (1) | 0.0984 | 0.0274 | 0.0043 – 0.6396 | |
| roon | 20 | 34 words | 109 | 1.8% (2) | 0.0504 | 0.0087 | 0.0032 – 0.7167 |
| 40 | 45 words | 35 | 2.9% (1) | 0.0438 | 0.0071 | 0.0038 – 0.5394 |
Aquinas's writing is flagged as AI more often than Roon for shorter passages, which decreases for longer passages - whereas Roon's tweets are more likely to be flagged as AI-generated as they get longer.
Score Distribution
False Positive Samples
| Text | Score | Label | Confidence |
|---|---|---|---|
| First, he declares his intention. We must consider that those things that are productive or preservative of goods in themselves, or restrictive of contraries, are called good because they are useful, and the nature of absolute good does not belong to the merely useful. | 0.5568 | AI-Generated | Low |
| Then, at in what way (1096b26), he handles a pertinent query. This inquiry belongs here, since predication according to different reasons is made in the first of two ways according to meanings that are without any relation to any one thing. | 0.5666 | AI-Generated | Low |
| Those things sought—that is, pursued or chosen, and desired or loved for themselves—are good according to one species or form of goodness. | 0.6254 | AI-Generated | Low |
| This particular injustice, which we discussed before [913–26], is different from legal injustice. A man may be called unjust in some measure not as being completely evil, but as being partially evil; for example, someone is called cowardly according to a particular evil. | 0.6435 | AI-Generated | Low |
| Other times it seems unfitting, to those following reason, that the equitable—as something beyond the just—is praiseworthy. Either the just thing or the equitable (which is other than the just) is not good. | 0.6632 | AI-Generated | Low |
| He says first that everything that has been said for either side of the doubt is in some way right and that, if correctly understood, no opposition lies hidden there. | 0.6688 | AI-Generated | Low |
| Consequently, he does not suffer injustice, but only undergoes some damage. 1069. Then, at however, it is obvious (1136b25), he determines the truth. | 0.7046 | Moderately AI-Assisted | Low |
| To clarify this, we must note that the verb “is” itself is sometimes predicated in an enunciation, as in “Socrates is.” By this we intend to signify that Socrates really is. | 0.7098 | AI-Generated | Low |
| It is in this man’s power to give, but not in his power to suffer injustice—there must be someone who inflicts the injustice. | 0.7722 | AI-Generated | Low |
| If a man wishes to share injustice in unfair portions, he who judges unjustly to curry someone’s favor has more good than belongs to him. | 0.7979 | AI-Generated | Low |
| For others—namely, the very wicked and the incurably evil—no particle of these goods is useful, but everything is harmful. | 0.8561 | AI-Generated | Medium |
| Likewise, we praise the kind of man who does it—we even call him a manly and perfect individual. So it is evident that, when we transfer praise to what is equitable, or to a person, as if to a greater good, we show what is equitable as something better than what is just. | 0.8857 | AI-Generated | Medium |
| It is a just thing, and it is better than one kind but not better than what is naturally just that is laid down absolutely or universally. | 0.9232 | AI-Generated | High |
13 samples
True Positive Rate on AI-Generated Text
Rate at which AI-generated text is correctly detected. Higher TPR is better.
| Source | Min Words | Avg Length | n | TPR% (n) | FNR% (n) | Mean Score | Median Score |
|---|---|---|---|---|---|---|---|
| aquinas | 20 | 26 words | 25 | 24.0% (6) | 76.0% (19) | 0.2949 | 0.0698 |
| 40 | 49 words | 25 | 32.0% (8) | 68.0% (17) | 0.3078 | 0.0695 | |
| 60 | 71 words | 25 | 16.0% (4) | 84.0% (21) | 0.3559 | 0.3281 | |
| 80 | 259 words | 25 | 56.0% (14) | 44.0% (11) | 0.5297 | 0.6677 | |
| roon | 20 | 20 words | 25 | 4.0% (1) | 96.0% (24) | 0.1313 | 0.0666 |
| 40 | 51 words | 25 | 8.0% (2) | 92.0% (23) | 0.1081 | 0.0300 | |
| *60 | 84 words | 25 | 24.0% (6) | 76.0% (19) | 0.2742 | 0.1205 | |
| *80 | 126 words | 25 | 20.0% (5) | 80.0% (20) | 0.2854 | 0.0869 |
* Extrapolated length — no sampled Roon tweets exceed 60 words
Simulated Roon tweets are harder to detect as AI-generated than simulated Aquinas. Detection accuracy improves for both as length increases.
Score Distribution
Sample Results
| Text | Score | Label | Confidence |
|---|---|---|---|
| The glutton and the drunkard ruin the same body but the drunkard ruins his mind too, which is why I have always thought drunkenness the uglier vice of the two. | 0.0052 | Human Written | High |
| Patience is not the same thing as indifference, though lazy men would like it to be, since patience suffers evil knowingly while indifference does not notice it. | 0.0103 | Human Written | High |
| Half the errors in theology come from men who know the answer before they have finished hearing the question, and will not slow down. | 0.0104 | Human Written | High |
| Nobody ever became wise by arguing, only by stopping to think after the argument was over and the other man had gone home. | 0.0110 | Human Written | High |
| I once watched two masters argue about free will for three hours and neither one defined it, which tells you most of what went wrong. | 0.0119 | Human Written | High |
| When I was young a master told me that the worst student is not the dull one but the quick one who never checks his work, and I have found this true many times over. | 0.0131 | Human Written | High |
| He who teaches badly does more harm than he who never teaches at all, because the first man fills up the space the truth needed. | 0.0152 | Human Written | High |
| The miser does not love money properly speaking; he loves keeping it, which is a different sickness altogether and a worse one. | 0.0200 | Human Written | High |
| Habit is a strange thing: a man can pray every morning for thirty years and the one morning he skips it his whole day falls apart like a bad wheel. | 0.0273 | Human Written | High |
| A man told me envy is a small sin. I asked him whether he had ever watched it eat somebody from the inside out for twenty years. | 0.0403 | Human Written | High |
| The man who reads much and prays little is like a cook who tastes every dish but never sits down to eat, and he ends up starving on a full stomach. | 0.0437 | Human Written | High |
| Some men confuse stubbornness with constancy, but the stubborn man clings to his own opinion while the constant man clings to the truth, and these are not alike. | 0.0583 | Human Written | High |
| I knew a priest who could explain the Trinity better than anyone in Naples but could not forgive his own brother. Knowledge without charity is lame. | 0.0698 | Human Written | High |
| My old teacher once said that patience is not sitting still but holding firm, and I think he was right, because a rock sits still and nobody calls it patient. | 0.1043 | Human Written | Medium |
| People confuse humility with thinking yourself useless, but a good carpenter who knows he is good and thanks God for it is humbler than a lazy one who shrugs. | 0.1774 | Human Written | Medium |
| Nobody sins by doing what he truly thinks is good; the trouble is that a man can be badly wrong about what is good and still feel certain. | 0.4145 | Human Written | Medium |
| A brother complained that manual labor was beneath him since he was ordained. I told him Christ was a carpenter's son, and that shut him up for a week. | 0.5129 | Human Written | Low |
| A young friar asked why God permits fools to talk so much. I said perhaps so the rest of us learn when to shut our mouths. | 0.5258 | Human Written | Low |
| A man who confuses patience with laziness will never correct either fault, because he thinks doing nothing is already a virtue. | 0.5282 | Human Written | Low |
| Envy is the one vice nobody boasts about, which tells you that even the vicious know it is shameful to be wounded by another man's good. | 0.6134 | AI-Generated | Medium |
| Whether mercy belongs to the weak or the strong is settled easily: only the strong can afford it. | 0.7829 | AI-Generated | Medium |
| A student asked me whether God could make a stone he cannot lift. I told him the question is broken, not deep, like asking whether a painter can paint a smell. | 0.7860 | AI-Generated | Medium |
| Wrath disguises itself as justice so well that the angry man genuinely believes he is defending the good when he is only feeding his own wound. | 0.8404 | AI-Generated | Medium |
| Temperance is harder to praise than courage because nobody writes songs about the man who quietly refused a second cup of wine. | 0.8587 | AI-Generated | High |
| Nobody calls a blind man courageous for walking past a snake he never saw. Courage requires knowing the danger first. | 0.8914 | AI-Generated | Medium |
25 samples