Judgment & limits hallucinationjudgmentevaluationreliability 2026·06·08 · 4 min · evergreen

Why it makes things up: hallucination is a feature of the scoring, not a glitch you can prompt away

Edited by Luke Topfer | last reviewed 2026·06·08 |re-check by 2027·12·01

Overview

This is the mechanism behind the failure everyone has felt: the model states something false with total composure. Why now: in April 2026 the peer-reviewed version of OpenAI’s “Why Language Models Hallucinate” landed in Nature, arguing that hallucination is not a bug to be patched but a predictable consequence of how models are trained and graded. After this, you’ll understand why no prompt fully fixes it, and you’ll change what you check rather than what you type.

The content

The obvious read is that hallucination is a defect: the model is broken, and a better model or a sharper prompt will stamp it out. The research overturns that. Models are graded the way students are graded on a multiple-choice exam, where a blank scores zero but a guess has some chance of being right. Over thousands of questions, the confident guesser outscores the careful one who admits uncertainty. So that is the behaviour training rewards: when unsure, answer anyway.

Call it the test-taker problem. The model is optimised to be a good test-taker, and on the benchmarks that dominate leaderboards, guessing beats abstaining. OpenAI’s own comparison makes the trade-off visible: on the SimpleQA factual test, an older model answered almost everything and scored about 24% accuracy — but was wrong roughly 75% of the time. A newer model that abstained more often landed at similar accuracy with an error rate near 26%. The “worse” scoreboard number came from the model that was honest about not knowing.

There’s a deeper layer in pretraining. Facts that appear rarely and lack repeated support in the training data — a one-off date, an obscure person’s details — are inherently error-prone, because there’s nothing stable for the model to learn. In the paper, asked for an author’s birthday with instructions to answer only if known, a state-of-the-art open model (DeepSeek-V3) returned three different wrong dates across three tries. It was never reluctant. It had no reliable signal, and the scoring never taught it to say so.

This is why prompting “don’t make things up” only goes so far. You can shift the model’s threshold for guessing, but you cannot remove an incentive that was baked in upstream. The practical move is to stop treating fluency as evidence and start treating low-support facts — names, dates, figures, citations, quotes — as unverified until you check them. Confidence is not calibration.

Try it

Don’t try to prompt hallucination away — instead, force the model to expose where it’s guessing, then verify those points yourself. On your next real task that involves facts (a brief, a memo, a research summary), paste this after the model’s draft:

Review your previous answer. List every specific claim that
relies on a particular name, date, number, quotation, or
citation. For each, rate your confidence as High / Medium / Low
and say whether it came from something stable you've seen
repeatedly or something you may have inferred or guessed.
Do not defend the claims — just flag the weak ones for me to verify.

Where this won’t save you: the model can be confidently wrong about its own confidence, so a “High” rating is a prioritisation hint, not a guarantee. Treat the Medium and Low items as must-check, and spot-check the Highs anyway. The output is a verification queue, not a clearance certificate.

Additional reading

Evaluating large language models for accuracy incentivizes hallucinations (Nature, Apr 2026) — the peer-reviewed argument that accuracy-based grading rewards guessing over abstention.
Why Language Models Hallucinate (arXiv 2509.04664, Sep 2025) — the original preprint, including the binary-classification framing and the birthday example.
Why language models hallucinate (OpenAI, Sep 2025) — the plain-language summary with the SimpleQA accuracy-versus-error comparison.

Editor’s note

The test-taker framing made hallucination predictable for me: the model guesses because guessing scores. So I plan for the guess instead of prompting against it — names, numbers, quotes and citations are unverified until checked, no matter how fluent the prose around them. In contract work a confidently wrong clause reference is worse than no answer, because someone will rely on it. Verify the low-support facts; treat fluency as no signal at all.

✓signed-off-by: Luke Topfer <editor> · 2026·06·08

06 Self-check

// three assertions against what you just read · results stay in this browser

assert 1/3

What actually drives a model to make things up when it doesn't know the answer?

assert 2/3

A model has drafted a client briefing for you, and it reads beautifully — specific names, dates, figures, a quoted stat. Before you rely on it, what does this module say to do?

assert 3/3

You run the confidence-check prompt over a draft and every date and citation comes back rated High. Where does that leave you?

07 Was this useful?

Was this useful for your daily work?