Workflows & iteration iterationverificationagentsefficiency 2026·06·08 · 4 min · evergreen

The generate-critique-refine loop: stop accepting first drafts

Edited by Luke Topfer | last reviewed 2026·06·08 |re-check by 2026·12·15

Overview

Run the model’s output back through the model against explicit criteria before you accept it: generate, critique, refine. That habit used to be something you did by hand, turn by turn. As of mid-2026 the tools run the loop for you — Claude Code’s /goal (shipped May 2026) takes a measurable completion condition and iterates autonomously until it’s true, with /loop-style recurring runs alongside it. Which changes what the skill is: not running the loop, but writing the condition it checks against.

The content

Start with the manual move, because it still carries most everyday work in a chat tool. The instinct when a draft disappoints is to rewrite the prompt and regenerate from scratch — treating the first draft as waste. The loop treats it as raw material: generate a draft, ask the model to critique it against named criteria, then refine using that critique. Separating “judge this” from “rewrite this” forces specific, actionable problems to surface before anything gets rewritten. The early evidence was striking — the Self-Refine study (NeurIPS 2023, run on models several generations old) found roughly 20% average quality gains from exactly this pattern, with one durable detail: generic feedback erased much of the gain, while extra rounds added progressively less.

The caveat is just as durable. Huang et al. (ICLR 2024) found that when models self-correct reasoning with nothing external to check against, accuracy can get worse. “Is this good? Make it better” produces confident reshuffling. The loop only works when the critique has a yardstick — your criteria, a rubric, a source document, test output — not the model’s untethered opinion of its own work.

Now the 2026 development, because the loop has been industrialised. Agent harnesses run generate-critique-refine natively: with Claude Code’s /goal you write a completion condition — “all tests passing, no TypeScript errors”, “Lighthouse score above 95” — and the agent plans, executes, verifies and iterates, sometimes for hours, until the condition is measurably true. /loop reruns work on a cadence; the same goal-directed pattern is spreading across tools. The human is no longer the loop’s engine. The human is the author of its stopping condition.

Read that against the ICLR finding and the lesson sharpens rather than ages: a vague goal handed to an autonomous loop is the self-correction failure at industrial scale — the agent will iterate confidently against nothing. The whole discipline compresses into one move: define “done” as something checkable before the loop starts. “Make it better” is not a condition. “Every claim has a source and the summary fits one page” is.

Try it

In a chat tool, run the loop by hand on a draft you need today — and notice the criteria step is the whole game:

Here is a draft I need to improve:

[PASTE YOUR DRAFT]

Step 1 — Critique. Evaluate this draft only against these criteria:
- [e.g. lands the ask in the first two sentences]
- [e.g. no unsupported claims]
- [e.g. reads in under 60 seconds]
List the specific places it falls short. Quote the weak passages. Do not rewrite yet.

Step 2 — Refine. Now rewrite the draft to fix every issue you listed,
and nothing else. Show the revised version only.

If you work in an agent harness that supports goal-directed runs (Claude Code’s /goal and similar), those bullet points are exactly what you hand it as the completion condition — written as checks a machine could verify, not vibes.

Where this won’t help: if you can’t state the criteria, the loop has nothing to check against and you’ll get fluent churn — manually or autonomously. When the standard lives only in your head, write it down first; that act usually improves the draft more than the model does. And for factual or numerical work, the external feedback has to be a real source or a run of the numbers, not the model marking its own homework.

Additional reading

Keep Claude working toward a goal — Claude Code docs — the official /goal reference: a measurable completion condition, evaluated each turn, with turns and tokens tracked. Shipped in v2.1.139, 11 May 2026.
Scheduled tasks — Claude Code docs — the official /loop reference: recurring runs on a time interval.
Self-Refine: Iterative Refinement with Self-Feedback (NeurIPS 2023) — the original generate-feedback-refine result (~20% average gains); ran on models several generations old — read it for the pattern, not the scores.
Large Language Models Cannot Self-Correct Reasoning Yet (ICLR 2024) — the failure mode that still governs: self-correction without external feedback can degrade accuracy.

Editor’s note

I rarely run this loop by hand any more now that /goal exists. The automation proved that looping was never the skill. For example, hand /goal a condition like “make the report better” and it will iterate to nowhere. Tell it to make sure that “every claim is sourced, it fits one page, and it takes two minutes to read” and it will move in the right direction. The trick is in making sure that your “done” criteria are machine-verifiable.

✓signed-off-by: Luke Topfer <editor> · 2026·06·08

06 Self-check

// three assertions against what you just read · results stay in this browser

assert 1/3

Why does the generate-critique-refine loop beat rewriting your prompt and regenerating a disappointing draft from scratch?

assert 2/3

You're handing a report-improvement task to a goal-directed agent like Claude Code's /goal, and it needs a completion condition. Which one does this module say to write?

assert 3/3

When does asking the model to critique and improve its own work risk making the output worse, not better?

07 Was this useful?

Was this useful for your daily work?