Hapax Legomenon

What Is a Hapax Legomenon?

A hapax legomenon (from Greek: "said only once", plural: hapax legomena) is a word that appears exactly once in a given text or corpus. The term is used in linguistics, biblical studies, computational linguistics, and — increasingly — AI content detection.

Examples

In English, many rare or highly specific words become hapax legomena in shorter texts. For example, in a 500-word essay about urban planning, a writer might use "carbuncle" or "liminal" exactly once — contextually appropriate but rare. In a corpus like the entire King James Bible, "Muppim" and "Huppim" appear exactly once (Numbers 26:39) — these are classic hapax legomena.

Why Hapax Rate Matters in AI Detection

Human writing tends to include more hapax legomena than AI-generated text. This is because human writers draw on personal vocabulary, use contextually specific word choices, make idiosyncratic stylistic decisions, and occasionally use rare or specialized terms that don't recur.

AI language models, by contrast, are trained to produce statistically likely token sequences. They tend to over-represent common vocabulary and under-represent rare words — producing text with a lower hapax legomenon rate than comparable human writing.

This connects to the type-token ratio (TTR) — a related metric. Hapax rate specifically measures once-occurring words; TTR measures overall vocabulary diversity (unique words / total words).

In Practice: Detection Signal Strength

Hapax legomenon rate is a useful supplementary signal in AI detection but not a primary one. Its value is highest when combined with perplexity and burstiness scores. Alone, it is too easily gamed — a humanizer tool that inserts occasional rare words will raise the hapax rate without meaningfully changing the text's statistical profile.

See our benchmark for how leading detectors incorporate vocabulary-based signals.

What Is a Hapax Legomenon?

Examples

Why Hapax Rate Matters in AI Detection

In Practice: Detection Signal Strength

Related terms

Discuss this topic