Your Agents' Real Bottleneck Isn't the Model — It's the Memory You Never Built
McKinsey just promised you round-the-clock software delivery.
In "Rewiring software delivery for the agentic era", they describe a world where the two-week sprint compresses into a daily one, agents execute overnight while you sleep, and teams of eight to twelve give way to small pods that supervise. It's a lovely promise, and the underlying diagnosis is mostly right. But, like almost every lovely promise from a consultancy, it skips the boring part: the reason your agents aren't pulling the night shift yet isn't the model they're running on. It's that you never built them a memory.
I'm not writing this from the consultant's seat, the one that draws the operating model. I'm writing it from the seat that operates the platform these agents land on. I've done enough DevOps and enough 3 a.m. on-call to know one thing: an autonomous agent at night is exactly the same thing as a freshly onboarded on-call engineer. Brilliant, tireless, and completely useless if nobody has written down how the system actually works. The model gives it a brain. A brain isn't what it's missing.
The model isn't the bottleneck; the missing context is
Think about what an agent actually needs to ship a change against your systems without breaking anything. It doesn't need to be smarter. It needs to know what "a risk client" means in your industry. It needs to know which of the six steps in that refund process must never be skipped. It needs to know why that one service was built in such an odd way — and that it was because of an incident last quarter that ended with the decision to never auto-retry against that endpoint.
None of that lives in the model's weights. It lives in three people's heads, in a Slack thread nobody will ever find, in a Confluence page that hasn't been touched in a year, in a closed Jira ticket, in a post-mortem nobody reread. The most powerful model on the planet, without that context, is a day-one hire with no onboarding. Give it the best brain on the market: if it doesn't know how your company works, it will guess. And an agent that guesses in production isn't productivity — it's debt.
McKinsey's four conditions are really just one
The article lists four conditions for agents to work: a clear business vision of what to build, a standard technology environment with common frameworks and modular architecture, a standard structure from requirement to code so inputs are predictable, and core stakeholders staying engaged across the value stream.
Four boxes. But look closely and they all point at the same place. They aren't four independent requirements; they're four faces of a single substrate. All four are about making your organization's context legible and reliable enough for a machine to reason over it. The clear vision is context about the what. The standard environment is context about the how. The requirement-to-code structure is context in a format the agent can consume without interpreting. And the engaged stakeholders are the guarantee that this context doesn't drift out of sync with reality halfway through the week. McKinsey sells it as a checklist; in the trenches it's one thing, and it has a name.
The knowledge graph is the AI's memory layer
Here's where the article says the genuinely interesting thing, and it's the one to take home. The companies out front are building knowledge graphs that act as an AI memory layer across the entire software development life cycle, one per domain: they connect customer feedback, architecture decision records, design documents, tickets, GitHub activity, incident reports, and compliance rules.
The key word is connect. A RAG system over a wiki — which I've written about for anyone integrating LLMs — retrieves the paragraph whose words match your question. Useful, but flat. A real memory layer knows something else: it knows that this incident triggered that decision record, which constrains this service, whose owner wrote that compliance rule. The value isn't in the nodes; it's in the edges. The difference between the two is the difference between an agent that quotes your wiki and an agent that respects your scars.
And this is exactly the layer I argued was the moat: the model commoditizes the easy 80%, and differentiation moves to the system wrapped around it. The memory of how your company actually works is the part of that system no model vendor can sell you, because they don't have it.
We've encoded tribal knowledge before, and we called it infrastructure as code
If this sounds familiar, it's because those of us who came up through operations have made this move a few times already. Every leap toward autonomy, without exception, has been the same gesture: take knowledge that lived in a senior engineer's head and encode it so a machine can act on it.
Hand-run runbooks became automated remediation. The deploy steps only one person knew became a CI/CD pipeline. "Ask Maria, she knows how prod is wired" became infrastructure as code. And here's the detail everyone forgets: the pipeline didn't start running itself because the tools got smart. It started running itself once we wrote down what Maria knew. Agents shipping software overnight is exactly that lesson, one floor up. The agent runs unsupervised up to precisely the point its context allows, and not one step further. The substrate is no new magic; it's tribal knowledge, finally written in a form a machine can traverse.
24-hour delivery is the prize, not the goal
That's why the daily cadence McKinsey sells is real, but it sits downstream of the substrate, not ahead of it. Overnight execution works as far as the agent's memory lets it run alone; past that line, it stops and waits for a human. So the metric that matters isn't "can agents run at night" — it's how far the agent gets before it hits a question only a person can answer — and every one of those stops is a context bug in your memory layer, not a model failure.
Here you have to let me concede the strong counter-argument, because it's a good one: "models get better every quarter, context windows keep growing — won't the next model just eat all of this?" Partly, yes. Some of today's scaffolding will be absorbed: better models need less hand-holding, bigger windows swallow more documents at once. But a context window is not a memory. Pasting your whole wiki into the prompt doesn't make the model know which step must never be skipped; it makes it read a pile of possibly-contradictory text and guess. Knowing requires curation, verification, freshness, and conflict resolution — deciding which of two contradictory sources is the truth that holds today. That's judgment, a human does it, and it's permanent engineering work. The model is rented, and identical for the company across the street. The curated memory of how your business actually works is the part nobody can rent.
What I'd do this quarter if I were your CTO
Five concrete bets, because a diagnosis without action is just a nice opinion:
- Find out where your context actually lives. Before you buy any "agent factory," do the uncomfortable inventory: how much of the knowledge an agent would need to ship code lives only in heads, in Slack threads, and in a stale wiki? That answer is your bottleneck. Not the model.
- Build the memory for ONE domain, not the whole company. A knowledge graph for the entire SDLC in one shot is a project that dies in committee. Pick a domain with real pain, connect its decision records, tickets, incidents, and compliance rules, and have an agent reason over it. Learn there before you scale.
- Standardize the path from requirement to code. It's the McKinsey condition that genuinely moves the needle. If every feature arrives in a different format, the agent guesses; if it arrives in a predictable structure, it executes. Reproducible inputs before autonomous outputs.
- Bake compliance into the memory, not onto the end. Risk, legal, and security rules should be nodes the agent reads while it builds, not a gate someone opens once everything is done. A control that lives in the graph improves traceability and completeness; a control that lives in a PDF is a bottleneck with a human face.
- Measure autonomy by how far the agent gets alone. Forget "percentage of code written by AI." The honest metric is how many steps an agent chains before it needs a human — and treating each stop as a context bug you fix in the substrate, not a ceiling of the model.
The line I keep defending is the same as ever, and here AI illustrates it more literally than I've ever seen: it doesn't replace the engineer, it leverages them. Someone has to decide which of two contradictory sources is the truth, which step is never skipped, when a piece of knowledge has gone stale. That work — building and maintaining the memory of how your company actually works — isn't done by the model. It's done by an engineer with judgment. And the more autonomous you want your agents, the more of them you need, not fewer.
McKinsey sells the destination: agents shipping software while you sleep. They're right that it's possible. What the slide spares you is that the cheap, interchangeable engine was never the problem. The problem, as always, is the whole car around it — and this time, the hardest piece to build is called memory.
Trying to get your agents from demo to production, and the thing that keeps drifting out of sync is the context, not the model? Talk to a CTO about standing up the nearshore squad that builds you the memory layer, not just one that wires the agent up.


