We Built Agile to Manage Slow Building. AI Just Made Building Fast.

How we build software in the AI era — the operating model that replaces Scrum, sprints, and story points with Kanban, human-and-agent flow, and verification.

TL;DR — Scrum was an excellent answer to a question that's quietly going away: how do we manage the risk and cost of building software when building is slow? AI collapses that cost. When building stops being the bottleneck, the rituals that existed to protect slow building — sprints, estimation, velocity, standups-as-status — turn into overhead. The bottleneck moves to the two ends nobody automated: deciding what to build, and verifying what got built. The model that fits this world isn't a faster Scrum. It's a pull-based, continuous-flow system — Kanban — where work-in-progress is capped by human review capacity (the new scarce resource), humans become Architects, Orchestrators, and Reviewers, and agents do the building. This post lays out the whole model and ends with files you can copy into a repo and run on Monday.

The uncomfortable part first

This is a post about how we build software in the AI era — not the tools, but the operating model: the roles, the rituals, the cadence, the way work actually moves through a team. Every framework we use for that today — agile, Scrum, SAFe, the whole software development lifecycle as we teach it — was shaped by one assumption. I want to name that assumption, show you it's dissolving, and lay out what replaces it.

I spent years inside Scrum. Sprint planning, story points, velocity charts, the daily standup, the retro. It worked. It genuinely worked — for the world it was built for.

That world had one defining property: building software was the expensive, slow, risky part. A feature took weeks. A wrong guess cost a sprint. So we invented an entire apparatus to manage that risk. We sliced work small so we could course-correct often. We estimated so we could plan around the slowness. We time-boxed so the slowness had a rhythm. We held a daily standup because work moved slowly enough that a once-a-day sync was the right resolution.

Every one of those practices is a response to the cost of building. Take the cost of building toward zero and you don't get a better Scrum. You get a Scrum solving a problem you no longer have.

That's where we are. Not "AI helps you code faster" — that framing is too small. The right framing is: the constraint that shaped two decades of how we organize software work is dissolving, and the method has to move with it.

AI is an amplifier, not an accelerator

Here's the first thing to get right, because everything else depends on it.

AI does not make your team faster in some uniform way. AI amplifies whatever your team already is. A team with a clear sense of what's worth building, clean interfaces, good taste, and real review discipline gets dramatically more leverage. A team that's fuzzy on what it's building, drowning in tech debt, and rubber-stamping reviews gets more of all of that — faster.

  SAME TOOL. OPPOSITE OUTCOME.

  Solid foundation  + AI   ████████████████████  →  leverage
  Shaky foundation  + AI   ████████████████████  →  waste

                           ▲ identical amplifier, pointed at different signal

Point an amplifier at signal and you get louder signal. Point it at noise and you get louder noise. This is why the same model produces a 10x team in one company and a pile of unmaintainable, half-working code in another. The tool isn't the variable. The foundation is.

So the most important work in adopting AI isn't picking the model — it's fixing the foundation the amplifier is pointed at. The rest of this post is mostly about that foundation.

The failure mode has a name worth remembering: mass production of waste. Speed up building but don't fix deciding and verifying, and you don't get more value — you get more code nobody needed, shipped faster than anyone can check it. That's not a hypothetical. It's the default outcome of dropping agents into an unchanged process.

Find the bottleneck. It moved.

Picture the full loop a product actually runs through:

   ENVISION  ──▶  ALIGN  ──▶  BUILD  ──▶  VERIFY  ──▶  LEARN
   what's        get          design,     does it      what does
   worth         everyone      write,      work? is     the market
   building      aligned       test        it right?    tell us?

For twenty years, BUILD was the fat part of that pipe. It dominated lead time, so we optimized it obsessively. Every agile practice you know is a build-phase optimization.

  WHERE LEAD TIME WENT — the old world

  Envision  ██
  Align     ███
  Build     ████████████████████████████   ◀── the bottleneck
  Verify    ████
  Learn     ███

AI is draining the build phase. Not entirely, not perfectly — but enough that it's no longer the constraint. And the iron law of any system is: relieve one bottleneck and the constraint moves; it doesn't disappear. Drain BUILD and the slow parts are suddenly visible on both ends:

  WHERE LEAD TIME GOES — now

  Envision  ████████████      ◀── now the bottleneck (judgment, taste, politics)
  Align     ████████          ◀── now the bottleneck (getting humans to agree)
  Build     ███               ◀── collapsed
  Verify    ██████████        ◀── now the bottleneck (10x output to check)
  Learn     ████████          ◀── always was slow (runs on customer time)

ENVISION and ALIGN — what's actually worth building, and getting humans to agree on it. No model decides this for you. Judgment, taste, politics. Human time.
VERIFY — confirming the thing is correct, safe, and solves the problem. Ten times more built means ten times more to check. Leave verification as "a human reviews every line" and it becomes the new wall everything piles up against.
LEARN — market validation runs on human timescales. You can't make customers adopt and tell you the truth faster than they will. Always slow; stays slow.

So the new game isn't "build faster." It's make the whole loop flow — especially the parts that never got automated. A method designed to optimize the build phase is now optimizing the one phase that's no longer the problem.

The new roles: humans stop being the ones who build

If agents build, what are people for? This is the question that scares everyone, and the answer is more interesting than "fewer jobs."

The work doesn't vanish. It moves up a level. A person on an AI-native team cycles through three roles, often in a single afternoon. Know which one you're in — the behavior is completely different in each.

                  ┌───────────────────────────┐
                  │         ARCHITECT          │
                  │  designs the environment    │
                  │  the agent works inside     │
                  │  owns the "-ilities"        │
                  └──────────────┬──────────────┘
                                 │ frames the work
                                 ▼
  ┌──────────────────────┐               ┌──────────────────────┐
  │     ORCHESTRATOR      │              │       REVIEWER        │
  │  triggers agent work  │              │  approves at the edge │
  │  decomposes the task  │              │  of their expertise   │
  │  steers mid-flight    │              │  owns the ship call   │
  └──────────┬────────────┘              └────────────▲──────────┘
             │ prompt + context                       │ artifact
             ▼                                        │
                    ┌──────────────────────┐
                    │         AGENT         │
                    │  executes             │
                    │  proposes, never      │
                    │  decides              │
                    └──────────────────────┘

The Architect designs the environment an agent runs inside — interfaces, constraints, context, the quality bar. The test of good architecture is brutal and simple: if a fresh agent joined today with a single task and only the repo to read, could it figure out what to do? If no, the architecture is incomplete. The Architect owns the things agents are bad at caring about on their own — performance, security, extensibility, cost — and is pulled in at the start, when the work is still a skeleton, not at the end as a reviewer. Reshaping the frame after an agent has built inside the wrong one is the most expensive mistake in this model.

The Orchestrator triggers and steers the agent. Decomposes a fuzzy goal into work an agent can execute, supplies the right context, corrects course as output appears. This is the role most people spend most of their time in. It's a real skill — closer to managing a brilliant, fast, literal-minded junior than to writing code.

The Reviewer approves output at the boundary of their own expertise and owns the decision to ship. The role you cannot fake. A reviewer who can't meaningfully evaluate the output is not a reviewer — they're a rubber stamp, and a rubber stamp is the single most dangerous object on an AI-native team. If a human can't truly review a piece of work, the answer isn't "approve it anyway." It's "find a human who can," or "don't ship it yet."

The Agent executes. It proposes; it never disposes. Agents don't own decisions. Ever.

Notice what's gone: the executor — the person whose job was to take a well-specified ticket and turn it into code. That role is being absorbed. The people who held it don't disappear; they move up into Architect, Orchestrator, and Reviewer. But the move is not automatic and not free. It's the central career transition of this decade, and pretending otherwise helps no one.

The map: every Scrum practice, and what replaces it

Here's the whole translation in one place. Read the why column — that's where the argument lives. Each row is the same logic: a practice that existed to manage slow building, replaced by one that manages flow, judgment, or verification instead.

Scrum / classic agile	AI-era replacement	Why it changes
Multi-week sprints	Continuous flow — work pulled as capacity frees	Sprints gave slow building a rhythm and a commitment boundary. When a feature takes hours, a two-week box is just latency you added on purpose.
Story-point estimation	Radius sizing — name the change, not the calendar	Estimation is a bet on how long building takes. Build time is collapsing and turning non-deterministic. The ROI of estimating drops to near zero. Size by blast radius.
Velocity tracking	Cycle time + flow	Velocity measures build throughput — the thing no longer scarce. Measure how fast an idea travels the whole loop instead.
Daily standup as status	Async board + intentional human touchpoints	A daily cadence matches daily-moving work. Agents move continuously, in parallel; status lives on the board. Keep the sync — for coordination and morale, not status.
Developer = code writer	Developer = architect + orchestrator + reviewer	Agents write most of the code. Human value moves to framing, steering, judging.
Product Owner = backlog manager	Product Owner = discovery + strategy	When building isn't the bottleneck, grooming build-tasks is low-leverage. Leverage is in what's worth building at all.
Scrum Master = process enforcer	Coach of human-agent teams (or the role dissolves)	Most friction the SM smoothed was build-ceremony friction. Less ceremony, less to enforce. What remains is coaching genuinely new skills.
Humans review every change	Layered agent review + humans at the edges	Reviewing everything by hand becomes the new wall. Let agents handle the routine pass; concentrate human attention where humans add judgment.
Team of 5–9	1–3 humans + a swarm of agents	Large teams parallelized slow manual building. Agents parallelize for nearly free. The team shrinks; the challenge becomes coordinating across many small teams.
"Working software over docs"	Documentation as the agent's input	Docs written for humans who'll never read them are overhead. Docs an agent reads to do the work right are the steering wheel. Write the "why," grow it through dialogue.
Done = code merged	Done = verified against a signal	Merged code nobody wanted is waste that compiled. Done means the loop closed: shipped and something real told you it mattered.

If you take one thing from the table: almost every classic agile practice is a build-phase optimization in disguise. Not a criticism — it was the right thing to optimize. It's just no longer the bottleneck.

Why Kanban, specifically

Plenty of people now agree sprints are awkward in an AI world. Fewer say what to do instead. My answer is Kanban — not as nostalgia, but because its core mechanics are exactly what human-and-agent collaboration needs. Let me make the case piece by piece, because this is the heart of the whole thing.

1. Pull, not push

Sprint planning is a push system: forecast a batch and push it into a time-box. That only makes sense when capacity is predictable and workers are scarce. Neither holds anymore. Signals, bugs, and ideas arrive continuously, not on a planning cadence. And agent capacity is on tap — you don't reserve it two weeks out. Kanban pulls: when capacity frees, the next most valuable thing comes in. The question is never "is an agent free?" (always yes) but "is a human free to frame and review this?"

2. WIP limits — and the limit is human review capacity

This is the single most important idea in the whole post, so slow down here.

In classic Kanban, you cap work-in-progress to expose bottlenecks and stop everyone starting things and finishing nothing. In an AI-native team the limit has a sharper meaning: it's a cap on how much agent output your humans can meaningfully Architect, steer, and Review at once.

  THE WIP REALITY

  Agent output    ████████████████████   20 PRs in an hour
  Human review    ██                      2 at a time
                  └────────────────────┘
                   everything past here is not progress —
                   it's inventory rotting in a queue
                   (or worse: getting rubber-stamped)

Building is no longer the constraint. Human judgment is. WIP limits make the new bottleneck visible and respected. They force the question "who can actually review this?" before the work is generated, not after it piles up. A team that ignores WIP limits in the AI era doesn't move faster — it mass-produces waste and calls it velocity.

3. Continuous flow

Agents don't keep your hours. They run in parallel, finish at odd times, and don't care about a sprint boundary. Corralling that into two-week batches is like trying to schedule the tide. Continuous flow — work moving whenever it's ready — is the natural shape of a system where the workers are always on and never synchronized.

4. Cycle time over velocity

  STOP MEASURING          START MEASURING
  ──────────────          ───────────────
  velocity        ✗       cycle time      ✓   idea → in customers' hands
  story points    ✗       flow efficiency ✓   working time ÷ time in flight
  "how much built"✗       escape rate     ✓   bugs reaching production

Velocity asks "how much did we build?" — a vanity metric when building is cheap. Cycle time asks "how fast does an idea travel from worth building to validated?" That maps to the actual bottleneck. Kanban is built around it by default.

5. Classes of service

Kanban's classes of service map cleanly onto something an AI-native team desperately needs: differentiated verification by risk. A cosmetic change and a billing change shouldn't get the same human attention — and when both can be built in five minutes, the only thing distinguishing them is how you gate them. The board encodes that, visibly.

6. Make work visible

A human teammate's work is legible — you see them in the standup, the PR, at their desk. An agent's work is invisible by default; it happens in a process you didn't watch. The board is the antidote: every piece of work, human or agent, is a card in a column. In a team where most of the doing is done by entities that don't speak up in the standup, visualizing the work isn't a nicety — it's how you keep a grip on reality.

Put those six together and the conclusion writes itself:

  KANBAN PRIMITIVE        WHY IT FITS HUMAN + AGENT WORK
  ────────────────        ──────────────────────────────
  pull, not push          agent capacity is always on; pull when a HUMAN frees
  WIP limits              the cap = human review capacity, the new bottleneck
  continuous flow         agents run parallel, async, off-hours
  cycle time              measures the real bottleneck, not build throughput
  classes of service      differentiated verification by risk
  visualized work         agent work is invisible by default — make it loud

Kanban isn't a stylistic preference over Scrum. It's the operating system whose primitives match the constraints of human-and-agent collaboration. Scrum's primitives matched the constraints of slow human building. The constraints changed. The OS should too.

The principles that hold it together

Roles and a board aren't enough. You need a few load-bearing principles everyone internalizes, so day-to-day decisions don't need a meeting. Each is a default you should have to argue your way out of, not into.

The agent lives where the work happens. Code review in the PR. Triage in the chat. Specs in the repo next to the code. Design in the design tool. An agent bolted onto a separate console nobody opens is theater.

AI-first, not AI-assisted. Design every flow assuming an agent does every step. Then mark the few steps that genuinely need a human, and write the reason next to each — judgment, approval, regulation, irreducible ambiguity. No reason? It's an agent step. This single discipline separates "we use AI" from "we are AI-native."

One source of truth, and it's the repo. Specs, decisions, definitions of done — files next to the code, version-controlled, diffable, readable by the next agent. Presentation tools mirror it. When two sources disagree, the repo wins.

Working artifact over document. The answer to "can you do this?" is "yes, here's it running" or "no, next." Not a deck. When building is cheap, the cheapest way to settle an argument is to build the smaller version and look at it.

Architect the agent, not just the system. Build the repo so a brand-new agent could be productive from the files alone. If it needs a chat thread and a tap on the shoulder to start, the environment is broken — not the agent.

The team making the change approves it. A central code-owner reviewing every change is a bottleneck that worsens exponentially as throughput climbs. The team that made the change owns review and merge. Only cross-cutting concerns escalate.

Security, telemetry, and feedback run in parallel — never a post-merge gate. If signal-gathering switches on after shipping, you've missed the window. Wire the verification and listening into the build.

The part everyone gets wrong: the human is not always in the loop

"Human in the loop" has become a comfort blanket. People say it to feel safe and stop thinking. But the honest finding on human-and-AI collaboration is genuinely uncomfortable: *a human plus an AI, naively combined, often performs worse than the better of the two alone.* Not better. Worse.

Why? Because the human in the middle has to make a second-order judgment they're bad at: when do I trust the machine, and when do I overrule it? Get it wrong either way and you've added a failure mode.

  THE INTERVENTION TRADE-OFF

  over-trust   ──────────────●──────────────  under-trust
  wave through                 ▲                discard correct
  the machine's        calibration is           answers, inject
  errors               the only safe spot       your own mistakes

The skill that matters isn't "stay in the loop." It's trust calibration — knowing precisely where your judgment adds value and where it just adds noise. Three traps follow, and an AI-native model has to design against all three:

The intervention paradox. "Always keep a human in the loop" isn't a strategy — it's an abdication of the real decision, which is which loops and where. A human at every step doesn't make the system safer; it often makes it worse and always slower. The work is to find the few high-leverage intervention points and staff those well. (That's exactly what the gates do — below.)

Cognitive atrophy. The one that should keep you up at night:

  THE ATROPHY SPIRAL

      trust the AI more
            │
            ▼
   exercise judgment less ───▶ judgment gets worse
            ▲                          │
            │                          ▼
   forced to trust AI more ◀── less able to catch its errors

There's causal evidence that leaning on AI assistance erodes the deep skills — reading code, debugging, conceptual understanding — that are exactly what you need to review AI output well. The reviewer's competence is what keeps the whole system honest, and it erodes silently. Defend it on purpose: keep some work AI-free as deliberate practice, pair seniors with juniors so the skill transfers, reward asking the machine "why" rather than just taking the "what."

Collaborative isolation. When your main collaborator is a model, you talk to humans less. It sounds efficient and it quietly corrodes teams — loneliness, drift, worse work. The fix isn't to slow the agents down; it's to design human-to-human touchpoints on purpose. The standup survives here not as status reporting (the board does that) but as the thing that keeps a team of people who mostly talk to machines still talking to each other.

The throughline: in an AI-native team you no longer get judgment, competence, or human connection for free as a byproduct of the work. You design for all three deliberately. That's a management responsibility now, not an emergent property.

Gates: the few places a human must say "yes"

If most steps are agent steps, the human approvals that remain have to earn their place. Keep the list short. The test for any gate:

Is this preventing a specific way things have actually gone wrong — or is it just there for comfort?

If it's comfort, delete it. Comfort gates are how a fast process slowly rots back into a slow one. The durable set is small:

  [G1] Architect @ skeleton          before agents swarm cross-cutting work
  [G2] Spec sign-off @ domain expert before significant building starts
  [G3] Team reviewer @ pull request  before merge — NEVER rubber-stamp
  [G4] Design @ user-facing surface  inline with the spec, not bolted on
  [G5] QA sampling                   risk-based, not one-to-one
  [G6] Security @ sensitive surface  parallel with build, closes pre-merge

Everything not on this list defaults to an agent step. The posture: agents by default, humans by exception, and every exception has a written reason. The work's size and risk trigger the gates automatically — you don't re-litigate them in a meeting:

  Lightning / Trivial      → G3
  Incremental / Standard   → G2, G3, G5  (+ G4 if user-facing)
  Foundation / High-risk   → G1, G2, G3, G5  (+ G4 if user-facing, G6 if sensitive)

Sizing by radius, not by time

One replacement deserves its own section, because it's where teams cling hardest to the old world: estimation.

Stop estimating time. When an agent's build path is non-deterministic and build cost is collapsing, "how many days will this take?" is both unanswerable and pointless. The energy you spent estimating should move upstream — to deep agreement on why and what.

Replace time-estimates with radius — the blast radius of the change, which is what actually determines how much human care it needs:

  lightning     1 repo · few files · 1 surface · no contract change
                → anyone drives it via an agent; team reviewer at PR
                → no architect, no spec

  incremental   1 repo · contained module · 1 surface
                → engineer-led or PM/eng hybrid
                → short spec or rich issue body; architect optional

  foundation    2+ repos OR cross-cutting (schema / public API / event
                contract / auth / data model) OR a large surface
                → ARCHITECT pulled in at skeleton — mandatory
                → written design + comment window; multi-team review

The radius does real work: it triggers the right gate and picks the right reviewer automatically. Size something Foundation and the architect requirement fires on its own. Tie-break: when unsure, pick the larger bucket; cross-cutting beats file count.

And phase big work by step and exit condition, not by week. "Step 3 is done when the migration runs clean on staging" is a real boundary. "Week 2" is a wish. Calendar dates stay legal as targets — launch dates, cycle-time goals — they're just not a measure of work size. Don't confuse a deadline with a size.

How to actually run this on Monday

A philosophy you can't operationalize is a TED talk, not a way of working. So I turned everything above into a set of files you can drop into a repo — version-controlled, agent-readable, the source of truth for how the team actually works. Think of them like a README or a design system: not prose to admire, but a contract to run on.

  your-repo/
  └── wow/
      ├── wow.md                 the one-page operating model — start here
      ├── roles.md               Architect / Orchestrator / Reviewer / Agent
      ├── workflow.md            the step-by-step delivery flow
      ├── kanban.md              board design — columns, WIP, classes, metrics
      ├── sizing.md              radius sizing — size work without estimating time
      ├── gates.md               the human approval gates
      └── definition-of-done.md  done = verified against a signal

File	Use it to…
<code>wow.md</code>	Onboard anyone — or any agent — to how the team works in five minutes
<code>workflow.md</code>	Run a piece of work from idea to validated, step by step
<code>roles.md</code>	Settle "who does what" without a meeting
<code>kanban.md</code>	Set up columns, WIP limits, classes of service, and the right metrics
<code>sizing.md</code>	Size any work in 30 seconds without estimating time
<code>gates.md</code>	Decide, unambiguously, where a human must say yes
<code>definition-of-done.md</code>	Define "done" as verified against a signal, per team

Edit them — they're defaults, not commandments. The point of putting them in the repo is that they're the same artifact a new teammate reads, a skeptic argues with via pull request, and an agent loads as context before doing the work. One source of truth, for humans and machines, living next to the code it governs.

The shape of the thing

Let me end where I started, but with the full picture in view.

We built Agile to manage the cost and risk of slow building. It was a great answer. AI is removing the thing it answered. So the work now isn't to do Agile faster — it's to rebuild the operating model around the constraints that are actually binding: deciding well, verifying well, and keeping human judgment sharp while the machine does the building.

  THE WHOLE MODEL ON ONE CARD

  humans          → architect, orchestrate, review
  agents          → build
  method          → Kanban, not Scrum
  the WIP limit   → human review capacity (the only scarce resource left)
  master skill    → trust calibration
  what's "done"   → verified against a signal, not merged
  defend on purpose → judgment · competence · human connection

None of this is finished. It's a v0, written to be argued with — ideally via pull request against those files. That's the whole spirit of it: the way of working is itself a thing you build, ship, measure, and revise. Start simple. Scale with intention. And don't let the amplifier point at noise.