The landscape mental-modelscapabilityefficiency 2026·06·08 · 4 min · evergreen

The jagged frontier: why AI aces one task and bombs the next one that looks identical

Edited by Luke Topfer | last reviewed 2026·06·08 |re-check by 2027·12·01

Overview

This is a mental model for where AI capability actually lives — and why it is shaped nothing like you expect.

Why now. In March 2026 the Harvard–BCG “jagged frontier” study was formally published in Organization Science, the peer-reviewed home of management research. That moves the headline finding from a much-shared working paper to a citable, durable result: across 758 consultants, AI lifted quality and speed on tasks inside its reach — yet on one task that sat just outside it, the people using AI were 19 percentage points less likely to get the right answer than the people with no AI at all.

What you’ll be able to do. Stop treating AI ability as a single dial that goes from “easy” to “hard”, and start mapping your own frontier — task by task — so you know which side of the line you’re standing on before you trust the output.

The content

The obvious mental model is a smooth line. Simple tasks are safe, complex tasks are risky, and somewhere in the middle is the limit of what AI can do. Pick work below the line; avoid work above it.

That picture is wrong, and the error is expensive.

Capability is not a line. It’s a coastline — a jagged frontier, in the study’s phrase, with deep inlets and jutting headlands. Two tasks that look equally hard to you can sit on opposite sides of it. Drafting a nuanced stakeholder email lands inside; a piece of arithmetic a teenager could check lands outside. The difficulty you perceive and the difficulty the model experiences are barely related, because the model’s competence tracks the shape of its training data and tooling, not your sense of effort.

The Harvard–BCG experiment shows what this costs in real work. The researchers picked 18 realistic consulting tasks they had pre-tested to sit inside the frontier; on those, AI users finished 12.2% more tasks, 25.1% faster, at markedly higher quality. Then they planted one task — a business problem requiring data and interview evidence to be reconciled — deliberately outside it. Consultants from the same pool, same tool, work that felt no harder. AI users did 19 points worse. The danger wasn’t that AI failed. It’s that it failed confidently, and people trusted it anyway.

So the skill isn’t “use AI” or “don’t”. It’s knowing the edge. The study found the strongest performers worked as “centaurs” and “cyborgs” — dividing work along the frontier, or weaving in and out of it — rather than handing over whole tasks blind. You can’t do that until you’ve mapped where your own line runs.

One caution that keeps this durable: the experiment itself ran on a 2023-era model, and the frontier has moved a long way since — the gains inside it have only grown. What hasn’t changed is the shape. The frontier moves with every release, and it moves unevenly. A headland that AI couldn’t reach last year may be deep inside this year, while a nearby inlet stays stubbornly out. So a map you drew six months ago is a starting hypothesis, not a fixed boundary. Re-test the edges, don’t assume them.

Try it

Don’t try to memorise the frontier — probe it on your own work. Take three tasks you’d plausibly hand to AI this week and pressure-test where each one sits, before you rely on the output.

I'm mapping where AI is reliable versus risky across my actual work.
Here are three tasks I'm considering using you for this week:

1. [task one — paste the real task]
2. [task two]
3. [task three]

For each task, tell me:
- Is this likely INSIDE or OUTSIDE your reliable frontier, and why —
  in terms of what the task actually requires (current facts, private
  context I haven't given you, multi-step reasoning that compounds
  errors, verifiable ground truth)?
- What's the specific failure mode if it's outside — what would a
  wrong answer look like, and would it look confident?
- One concrete check I could run to catch that failure myself.

Don't be reassuring. I want the honest edge, not encouragement.

Where this breaks: the model is guessing at its own limits, so treat its self-assessment as a hypothesis to verify, not a verdict. The real test is the check it hands you in the last line — run that on the actual output. If a task involves current events, private data you didn’t supply, or arithmetic that has to be exactly right, assume it’s near the edge regardless of what the model claims.

Additional reading

Navigating the Jagged Technological Frontier — Organization Science (peer-reviewed, March 2026) — the formally published study; 758 BCG consultants, the inside-frontier gains and the 19-point outside-frontier penalty. Note the experiment ran on a 2023-era model — read it for the shape of the frontier, not today’s coastline.
Harvard Business School AI Institute summary — plain-language overview, including the “centaur” and “cyborg” working patterns.
Working Paper 24-013 (full paper, SSRN) — the complete methodology and task design, for the detail behind the headline numbers.

Editor’s note

The finding that sticks is the direction of the harm: outside the frontier, people using AI did worse than people without it. That matches my experience — the same model that built me a working arcade game in an afternoon will fumble a task I would have rated far easier, and the wins train you to stop checking. So I treat my sense of the frontier as a hypothesis with a short shelf life. Re-test the edge after every model release; never carry trust across it.

✓signed-off-by: Luke Topfer <editor> · 2026·06·08

06 Self-check

// three assertions against what you just read · results stay in this browser

assert 1/3

Why can the same AI ace one task and then bomb another that looks just as hard?

assert 2/3

You have three tasks you could hand to AI this week — a nuanced stakeholder email, some budget arithmetic, and a report summary. What's the move before you rely on anything it produces?

assert 3/3

You run this module's mapping prompt and the model declares a task safely inside its frontier. Where does that leave you?

07 Was this useful?

Was this useful for your daily work?