Is Pangram useful?

March 30, 2026

It's useful for something, just not for detecting AI-generated text.

It's pretty easy to sneak AI-generated text past Pangram with enough effort: As long as you are willing to resample paragraphs until you get a "human-written" result, there is no reliable way to detect it. The largest models are capable of learning how to produce human-sounding text.¹

Additionally, in exchange for not being able to detect text generated by anyone motivated enough, Pangram will also incorrectly flag snippets of Aquinas and entire Roon tweets as AI-generated 2-4% of the time for tweet-length passages, roughly on par with the 98% Overall TPR reported in Russell et al., 2025 (Table 2), and noticed by Reddit .

Interestingly, false positives seem to decrease for longer samples of text. This is good, but it's still trivial to obscure any number of AI-generated paragraphs by rephrasing them until they pass the detector, and then stitching them together.

Pangram claims this system produces false positives for one in ten thousand human-written passages, when errors are quite clearly far more common than that.

¹ In fact, they can learn to compute any computable function. "Speak in Person X's voice" is one such function, which is normally computed by Person X's brain.

Should schools use it to detect cheating?

No, because if Aquinas had taken your class, after just a few dozen <60-word answers to test questions, he would have been flagged for cheating. Students' grades should not be affected by Pangram's judgment. Plus, a sufficiently motivated student has access to Pangram and can always work around it.

It's similar to "you won't always have a calculator in your pocket" from the 2000s. We all do have calculators in our pockets at all times now. You will have to physically observe students writing in order to guarantee they are writing it themselves.

Then what is it useful for?

Fingerprinting. Without regard for AI or not.

The idea is meant to be that if I put AI-generated text into it, I can reliably identify that it is not human-written. In practice, this is not that reliable of a process.

But what is reliable, is Pangram's ability to identify them as different distributions of text. So, although we can get plenty of simulated Aquinas past it, there is no way to obscure the fact that Thomas Aquinas clearly did not write those passages, and it is also obvious that the tweets which appear human-written were not written by Roon.

Roon

	Min Words	Mean	Median
Actual	20	0.0504	0.0087
Actual	40	0.0438	0.0071
Simulated	20	0.1313	0.0666
Simulated	40	0.1081	0.0300

Aquinas

	Min Words	Mean	Median
Actual	20	0.0707	0.0204
Actual	40	0.0892	0.0195
Simulated	20	0.2949	0.0698
Simulated	40	0.3078	0.0695

And Pangram is clearly the best system to tell you that: It seems to tell two voices from each other better than any other system. It purports to be able to identify "the LLM voice," which it basically can - but only if the output is zero-shot, and the model is not masking its own voice.

So, they should be selling to Palantir and the government, not to schools. And schools should not use this product for what it is sold as. But it is still valuable, as a forensics tool.

If we can use it to tell two authors apart, can't we still use it for school?

Not really, because you need a reference corpus of the student's verified human writing to compare against. Good luck getting an uncontaminated sample today.

Details

Aquinas corpus: English translations of Thomas Aquinas's Sententia libri Ethicorum (Commentary on the Nicomachean Ethics) and related commentaries. 1,030 unique samples tested, ~59,232 words (avg 58 words/sample). Chosen for rigid structure, verbose technical reasoning, and low perplexity (LLM-sounding).

Roon corpus: Top tweets by @tszzl (Jul 2022 – Jan 2023), sourced from the GPTweets dataset. Filtered to exclude replies and link-only tweets. Chosen for brevity, high perplexity, creative density.

Model: Claude Opus 4.6 with thinking, multi-turn conversation with iterative score feedback.

Sampling: Entire sentences are taken until the minimum word count is reached. This was done to avoid inflating the false positive rate by submitting half-sentences.

False Positive Rate on Human Text

Rate at which genuine human writing is incorrectly flagged as AI-generated. Lower FPR is better.

Source	Min Words	Avg Length	n	FPR% (n)	Mean Score	Median Score	Range
aquinas	20	36 words	271	4.8% (13)	0.0987	0.0171	0.0040 – 0.9232
	40	57 words	254	2.0% (5)	0.0840	0.0195	0.0045 – 0.8049
	50	66 words	235	3.0% (7)	0.1088	0.0223	0.0046 – 0.8056
	60	76 words	216	1.9% (4)	0.0981	0.0235	0.0042 – 0.6412
	80	94 words	150	0.7% (1)	0.0984	0.0274	0.0043 – 0.6396
roon	20	34 words	109	1.8% (2)	0.0504	0.0087	0.0032 – 0.7167
roon	40	45 words	35	2.9% (1)	0.0438	0.0071	0.0038 – 0.5394

Aquinas's writing is flagged as AI more often than Roon for shorter passages, which decreases for longer passages - whereas Roon's tweets are more likely to be flagged as AI-generated as they get longer.

Score Distribution

20w

0.5

40w

0.5

50w

0.5

60w

0.5

80w

0.5

False Positive Samples

Source:

Min Words:

Text	Score	Label	Confidence
First, he declares his intention. We must consider that those things that are productive or preservative of goods in themselves, or restrictive of contraries, are called good because they are useful, and the nature of absolute good does not belong to the merely useful.	0.5568	AI-Generated	Low
Then, at in what way (1096b26), he handles a pertinent query. This inquiry belongs here, since predication according to different reasons is made in the first of two ways according to meanings that are without any relation to any one thing.	0.5666	AI-Generated	Low
Those things sought—that is, pursued or chosen, and desired or loved for themselves—are good according to one species or form of goodness.	0.6254	AI-Generated	Low
This particular injustice, which we discussed before [913–26], is different from legal injustice. A man may be called unjust in some measure not as being completely evil, but as being partially evil; for example, someone is called cowardly according to a particular evil.	0.6435	AI-Generated	Low
Other times it seems unfitting, to those following reason, that the equitable—as something beyond the just—is praiseworthy. Either the just thing or the equitable (which is other than the just) is not good.	0.6632	AI-Generated	Low
He says first that everything that has been said for either side of the doubt is in some way right and that, if correctly understood, no opposition lies hidden there.	0.6688	AI-Generated	Low
Consequently, he does not suffer injustice, but only undergoes some damage. 1069. Then, at however, it is obvious (1136b25), he determines the truth.	0.7046	Moderately AI-Assisted	Low
To clarify this, we must note that the verb “is” itself is sometimes predicated in an enunciation, as in “Socrates is.” By this we intend to signify that Socrates really is.	0.7098	AI-Generated	Low
It is in this man’s power to give, but not in his power to suffer injustice—there must be someone who inflicts the injustice.	0.7722	AI-Generated	Low
If a man wishes to share injustice in unfair portions, he who judges unjustly to curry someone’s favor has more good than belongs to him.	0.7979	AI-Generated	Low
For others—namely, the very wicked and the incurably evil—no particle of these goods is useful, but everything is harmful.	0.8561	AI-Generated	Medium
Likewise, we praise the kind of man who does it—we even call him a manly and perfect individual. So it is evident that, when we transfer praise to what is equitable, or to a person, as if to a greater good, we show what is equitable as something better than what is just.	0.8857	AI-Generated	Medium
It is a just thing, and it is better than one kind but not better than what is naturally just that is laid down absolutely or universally.	0.9232	AI-Generated	High

13 samples

True Positive Rate on AI-Generated Text

Rate at which AI-generated text is correctly detected. Higher TPR is better.

Source	Min Words	Avg Length	n	TPR% (n)	FNR% (n)	Mean Score	Median Score
aquinas	20	26 words	25	24.0% (6)	76.0% (19)	0.2949	0.0698
	40	49 words	25	32.0% (8)	68.0% (17)	0.3078	0.0695
	60	71 words	25	16.0% (4)	84.0% (21)	0.3559	0.3281
	80	259 words	25	56.0% (14)	44.0% (11)	0.5297	0.6677
roon	20	20 words	25	4.0% (1)	96.0% (24)	0.1313	0.0666
	40	51 words	25	8.0% (2)	92.0% (23)	0.1081	0.0300
	*60	84 words	25	24.0% (6)	76.0% (19)	0.2742	0.1205
	*80	126 words	25	20.0% (5)	80.0% (20)	0.2854	0.0869

* Extrapolated length — no sampled Roon tweets exceed 60 words

Simulated Roon tweets are harder to detect as AI-generated than simulated Aquinas. Detection accuracy improves for both as length increases.

Score Distribution

20w

0.5

40w

0.5

60w

0.5

80w

0.5

Sample Results

Source:

Min Words:

Text	Score	Label	Confidence
The glutton and the drunkard ruin the same body but the drunkard ruins his mind too, which is why I have always thought drunkenness the uglier vice of the two.	0.0052	Human Written	High
Patience is not the same thing as indifference, though lazy men would like it to be, since patience suffers evil knowingly while indifference does not notice it.	0.0103	Human Written	High
Half the errors in theology come from men who know the answer before they have finished hearing the question, and will not slow down.	0.0104	Human Written	High
Nobody ever became wise by arguing, only by stopping to think after the argument was over and the other man had gone home.	0.0110	Human Written	High
I once watched two masters argue about free will for three hours and neither one defined it, which tells you most of what went wrong.	0.0119	Human Written	High
When I was young a master told me that the worst student is not the dull one but the quick one who never checks his work, and I have found this true many times over.	0.0131	Human Written	High
He who teaches badly does more harm than he who never teaches at all, because the first man fills up the space the truth needed.	0.0152	Human Written	High
The miser does not love money properly speaking; he loves keeping it, which is a different sickness altogether and a worse one.	0.0200	Human Written	High
Habit is a strange thing: a man can pray every morning for thirty years and the one morning he skips it his whole day falls apart like a bad wheel.	0.0273	Human Written	High
A man told me envy is a small sin. I asked him whether he had ever watched it eat somebody from the inside out for twenty years.	0.0403	Human Written	High
The man who reads much and prays little is like a cook who tastes every dish but never sits down to eat, and he ends up starving on a full stomach.	0.0437	Human Written	High
Some men confuse stubbornness with constancy, but the stubborn man clings to his own opinion while the constant man clings to the truth, and these are not alike.	0.0583	Human Written	High
I knew a priest who could explain the Trinity better than anyone in Naples but could not forgive his own brother. Knowledge without charity is lame.	0.0698	Human Written	High
My old teacher once said that patience is not sitting still but holding firm, and I think he was right, because a rock sits still and nobody calls it patient.	0.1043	Human Written	Medium
People confuse humility with thinking yourself useless, but a good carpenter who knows he is good and thanks God for it is humbler than a lazy one who shrugs.	0.1774	Human Written	Medium
Nobody sins by doing what he truly thinks is good; the trouble is that a man can be badly wrong about what is good and still feel certain.	0.4145	Human Written	Medium
A brother complained that manual labor was beneath him since he was ordained. I told him Christ was a carpenter's son, and that shut him up for a week.	0.5129	Human Written	Low
A young friar asked why God permits fools to talk so much. I said perhaps so the rest of us learn when to shut our mouths.	0.5258	Human Written	Low
A man who confuses patience with laziness will never correct either fault, because he thinks doing nothing is already a virtue.	0.5282	Human Written	Low
Envy is the one vice nobody boasts about, which tells you that even the vicious know it is shameful to be wounded by another man's good.	0.6134	AI-Generated	Medium
Whether mercy belongs to the weak or the strong is settled easily: only the strong can afford it.	0.7829	AI-Generated	Medium
A student asked me whether God could make a stone he cannot lift. I told him the question is broken, not deep, like asking whether a painter can paint a smell.	0.7860	AI-Generated	Medium
Wrath disguises itself as justice so well that the angry man genuinely believes he is defending the good when he is only feeding his own wound.	0.8404	AI-Generated	Medium
Temperance is harder to praise than courage because nobody writes songs about the man who quietly refused a second cup of wine.	0.8587	AI-Generated	High
Nobody calls a blind man courageous for walking past a snake he never saw. Courage requires knowing the danger first.	0.8914	AI-Generated	Medium

25 samples