False Positive Rate
In AI detection, the false positive rate (FPR) is how often a detector incorrectly flags human-written text as AI-generated. The most consequential metric for academic and publishing deployments.
What Is the False Positive Rate?
The false positive rate (FPR) measures how often a detector incorrectly classifies a negative case as positive. In AI content detection, this means how often human-written text is wrongly flagged as AI-generated.
Formally: FPR = False Positives / (False Positives + True Negatives)
A detector with a 10% FPR will incorrectly accuse 1 in 10 human writers of using AI. At scale, this has serious consequences.
Why FPR Matters More Than Accuracy
Overall accuracy is the headline benchmark number, but FPR is the number that determines whether a tool is safe to deploy in high-stakes contexts. A detector with 90% accuracy but 20% FPR is dangerous — it will falsely accuse 1 in 5 students who wrote their own work.
Academic institutions, publishers, and moderation systems need to weigh FPR very carefully. The reputational and legal consequences of a false accusation are asymmetric — far more damaging than a missed AI-generated piece.
FPR Across Content Types
FPR varies significantly by writing domain. STEM academic writing has naturally low perplexity — a key detection signal — which causes detectors calibrated on general corpora to over-flag it as AI-generated. Our research shows FPR as high as 34% on STEM content with general-purpose detectors.
Non-native English speakers are also disproportionately affected. Studies from Stanford and MIT have shown that non-native academic writing exhibits lower lexical diversity and lower burstiness — both signals that detectors interpret as AI-like.
Current Benchmark Numbers
Across our March 2026 benchmark of 1,200 human-written samples:
- Originality.ai: 7% FPR (best in class)
- GPTZero: 10% FPR
- Writer.com: 8% FPR
- Copyleaks: 12% FPR
- Sapling AI: 17% FPR
See the full benchmark comparison for methodology and per-category breakdowns.
Relationship to False Negative Rate
There is an inherent tradeoff: making a detector more sensitive (lower FNR — catching more AI text) tends to increase FPR. Detector developers tune this threshold based on their target use case. Tools aimed at content moderation typically prioritize low FNR; tools aimed at academic integrity should prioritize low FPR.
Related terms
Discuss this topic
Join practitioners, researchers, and publishers discussing AI detection methodology in the community forum.