Hapax Legomenon
A hapax legomenon is a word that appears exactly once in a given text or corpus. In AI detection, hapax legomenon rate is a measurable signal: human writing has more hapax legomena than AI-generated text.
What Is a Hapax Legomenon?
A hapax legomenon (from Greek: "said only once", plural: hapax legomena) is a word that appears exactly once in a given text or corpus. The term is used in linguistics, biblical studies, computational linguistics, and — increasingly — AI content detection.
Examples
In English, many rare or highly specific words become hapax legomena in shorter texts. For example, in a 500-word essay about urban planning, a writer might use "carbuncle" or "liminal" exactly once — contextually appropriate but rare. In a corpus like the entire King James Bible, "Muppim" and "Huppim" appear exactly once (Numbers 26:39) — these are classic hapax legomena.
Why Hapax Rate Matters in AI Detection
Human writing tends to include more hapax legomena than AI-generated text. This is because human writers draw on personal vocabulary, use contextually specific word choices, make idiosyncratic stylistic decisions, and occasionally use rare or specialized terms that don't recur.
AI language models, by contrast, are trained to produce statistically likely token sequences. They tend to over-represent common vocabulary and under-represent rare words — producing text with a lower hapax legomenon rate than comparable human writing.
This connects to the type-token ratio (TTR) — a related metric. Hapax rate specifically measures once-occurring words; TTR measures overall vocabulary diversity (unique words / total words).
In Practice: Detection Signal Strength
Hapax legomenon rate is a useful supplementary signal in AI detection but not a primary one. Its value is highest when combined with perplexity and burstiness scores. Alone, it is too easily gamed — a humanizer tool that inserts occasional rare words will raise the hapax rate without meaningfully changing the text's statistical profile.
See our benchmark for how leading detectors incorporate vocabulary-based signals.
Related terms
Discuss this topic
Join practitioners, researchers, and publishers discussing AI detection methodology in the community forum.