watts.it.com // daily AI micro-learning
The landscape mental-modelscapabilityefficiency 2026·06·08 · 4 min · evergreen

The jagged frontier: why AI aces one task and bombs the next one that looks identical

Overview

This is a mental model for where AI capability actually lives — and why it is shaped nothing like you expect.

Why now. In March 2026 the Harvard–BCG “jagged frontier” study was formally published in Organization Science, the peer-reviewed home of management research. That moves the headline finding from a much-shared working paper to a citable, durable result: across 758 consultants, AI lifted quality and speed on tasks inside its reach — yet on one task that sat just outside it, the people using AI were 19 percentage points less likely to get the right answer than the people with no AI at all.

What you’ll be able to do. Stop treating AI ability as a single dial that goes from “easy” to “hard”, and start mapping your own frontier — task by task — so you know which side of the line you’re standing on before you trust the output.

The content

The obvious mental model is a smooth line. Simple tasks are safe, complex tasks are risky, and somewhere in the middle is the limit of what AI can do. Pick work below the line; avoid work above it.

That picture is wrong, and the error is expensive.

Capability is not a line. It’s a coastline — a jagged frontier, in the study’s phrase, with deep inlets and jutting headlands. Two tasks that look equally hard to you can sit on opposite sides of it. Drafting a nuanced stakeholder email lands inside; a piece of arithmetic a teenager could check lands outside. The difficulty you perceive and the difficulty the model experiences are barely related, because the model’s competence tracks the shape of its training data and tooling, not your sense of effort.

The Harvard–BCG experiment shows what this costs in real work. The researchers picked 18 realistic consulting tasks they had pre-tested to sit inside the frontier; on those, AI users finished 12.2% more tasks, 25.1% faster, at markedly higher quality. Then they planted one task — a business problem requiring data and interview evidence to be reconciled — deliberately outside it. Consultants from the same pool, same tool, work that felt no harder. AI users did 19 points worse. The danger wasn’t that AI failed. It’s that it failed confidently, and people trusted it anyway.

So the skill isn’t “use AI” or “don’t”. It’s knowing the edge. The study found the strongest performers worked as “centaurs” and “cyborgs” — dividing work along the frontier, or weaving in and out of it — rather than handing over whole tasks blind. You can’t do that until you’ve mapped where your own line runs.

One caution that keeps this durable: the experiment itself ran on a 2023-era model, and the frontier has moved a long way since — the gains inside it have only grown. What hasn’t changed is the shape. The frontier moves with every release, and it moves unevenly. A headland that AI couldn’t reach last year may be deep inside this year, while a nearby inlet stays stubbornly out. So a map you drew six months ago is a starting hypothesis, not a fixed boundary. Re-test the edges, don’t assume them.

Try it

Don’t try to memorise the frontier — probe it on your own work. Take three tasks you’d plausibly hand to AI this week and pressure-test where each one sits, before you rely on the output.

I'm mapping where AI is reliable versus risky across my actual work.
Here are three tasks I'm considering using you for this week:

1. [task one — paste the real task]
2. [task two]
3. [task three]

For each task, tell me:
- Is this likely INSIDE or OUTSIDE your reliable frontier, and why —
  in terms of what the task actually requires (current facts, private
  context I haven't given you, multi-step reasoning that compounds
  errors, verifiable ground truth)?
- What's the specific failure mode if it's outside — what would a
  wrong answer look like, and would it look confident?
- One concrete check I could run to catch that failure myself.

Don't be reassuring. I want the honest edge, not encouragement.

Where this breaks: the model is guessing at its own limits, so treat its self-assessment as a hypothesis to verify, not a verdict. The real test is the check it hands you in the last line — run that on the actual output. If a task involves current events, private data you didn’t supply, or arithmetic that has to be exactly right, assume it’s near the edge regardless of what the model claims.

Additional reading

Editor’s note

The finding that sticks is the direction of the harm: outside the frontier, people using AI did worse than people without it. That matches my experience — the same model that built me a working arcade game in an afternoon will fumble a task I would have rated far easier, and the wins train you to stop checking. So I treat my sense of the frontier as a hypothesis with a short shelf life. Re-test the edge after every model release; never carry trust across it.

signed-off-by: Luke Topfer <editor> · 2026·06·08