strategy

The EU Wants AI Proven Safe Before It Ships. That's Not How AI Fails.

By Marc Molas·June 16, 2026·10 min read

Every engineer who has run something at scale has learned the same humbling lesson: you cannot prove a complex system safe before it runs. You review it, you test it, you reason about every path you can think of — and it still surprises you in production, because production is where the inputs you never imagined finally show up. So we stopped pretending. We instrument the system, we watch it, and we build the reflexes to respond when it does something we didn't predict. The European Union is regulating AI as if the opposite were true.

On 11 June, Bruegel published a policy brief by Mario Mariniello — The right balance: how to fix European Union artificial intelligence regulation — that says, in the measured language of a Brussels think tank, roughly what an SRE would tell you after a bad night on call: the AI Act bets too much on proving safety before deployment, and too little on catching harm after it. And it puts a price on the bet. Citing Haataja and Bryson, Mariniello estimates compliance for an average AI system at €14,623 to €29,277 — 9 to 17 percent of a €170,000 development budget. Read it as money: between 9 and 17 cents of every euro you spend building an AI system goes to proving it compliant, before it reaches a single user.

I ship production AI systems that fall under this regime, and before that I spent years in DevOps and incident response, where the whole job lived in the gap between what we'd signed off and what actually happened at 3 a.m. At Conectia we built our hiring preselection to the AI Act's high-risk rules before the law reached it so I'm not an outsider grumbling about red tape. I've paid this bill on purpose. My problem isn't that Europe regulates AI. It's that it spends the most rigor on the one day it knows the least.

You can't certify a system whose behavior is discovered after it ships

The AI Act sorts systems by their intended purpose at the moment they're placed on the market, and it demonstrates safety largely through conformity paperwork filed before launch. But AI doesn't hold still for the photo. Most of a model's real capabilities — and its real failure modes — are discovered after deployment (the brief leans on Bengio et al., 2024, for this, and anyone who has shipped an LLM feature already knows it in their bones). A system that is minimal-risk on launch day becomes high-risk the moment a user does something the developer never intended and doesn't control.

The evidence is piling up in exactly the category the Act waves through as low-risk: chatbots. A May 2026 study of Scottish election queries found 34.1% of AI chatbot answers contained factual errors about how to vote. In Nevada, four of five chatbots tested gave wrong voter-registration information. And in August 2025, Raine v. OpenAI put a chatbot's role in a teenager's suicide in front of a San Francisco court. None of these harms is something a pre-launch checklist could have caught — because none of them existed at launch. Ex-ante conformity certifies a snapshot. The system is a movie.

We already walked out of this movie, it's called waterfall

Engineers have run this exact experiment, and we left. For years we tried to certify a release correct before shipping it: sign-offs, change-approval boards, a gate you cleared once and then prayed. We abandoned it — not because we got reckless, but because the proof could never keep pace with the rate of change. The whole DevOps and SRE movement was the migration of rigor from before the release to around the running system: observability, error budgets, rollback, blameless postmortems. We didn't stop caring about safety. We moved it to where the risk actually lives.

Detection infrastructure modeled on the FDA's Sentinel system — sample the live traffic, watch for adverse events in near-real time. A non-punitive near-miss reporting scheme modeled on aviation's, the Aviation Safety Reporting System that has run since 1976 — which is a blameless postmortem at industry scale. A standardized incident taxonomy and a public registry, built on the OECD's 2025 framework, so the whole market learns from each failure once instead of each company learning it the hard way. Instrument, report, respond. I trust this approach for one boring reason: it's how I already run everything I'm responsible for.

The up-front bill quietly hands the market to the incumbents

There's a second problem with front-loading the cost, and it has nothing to do with safety — it's about who's left standing. That compliance bill doesn't scale down. Fifteen to thirty thousand euros and a quality-management system is a rounding error for a large firm and a wall for a two-person startup. GDPR contributed to market concentration by disproportionately burdening smaller firms. So a regime sold as protecting people from powerful AI can quietly end up protecting powerful AI companies from competition.

There's a fix that borrows from the Digital Services Act: tier the burden to the deployment's reach. Light-audit tier for SMEs — under €50M turnover, or fewer than 100,000 people affected — and only for reversible-harm uses; a standard tier in the middle; and an intense tier for the giants, above €150M or more than a million people affected, where third-party assessment is mandatory and self-certification ends. Match the scrutiny to the blast radius, instead of pointing one gate at everyone. It's worth looking at who'd be doing the catching today: more than 2,000 national market-surveillance authorities, most of them built to inspect physical products, against an AI Office of roughly 125 people — the kind of fragmented enforcement that already hobbles EU tech rules. The apparatus meant to catch a dangerous algorithm was largely designed to catch a dangerous toaster.

"Ship and watch" is also a great way to let the harm happen first

Here is the strongest objection, and I won't dodge it: ex-post monitoring can sound like "move fast and break things" with a regulatory blessing. And the cost is real and ugly — ex-post means the harm sometimes happens before the response arrives. An incident registry does nothing for the teenager in the Raine case. An election poisoned by chatbot misinformation doesn't roll back like a bad deploy. Anyone selling pure after-the-fact monitoring as a clean upgrade is burying that number, and you shouldn't let them.

But that isn't the proposal, and the axis that resolves it is irreversibility. You don't observe-and-respond your way through a medical diagnosis or a self-driving car — there the gate stays up front, and Mariniello keeps strict liability for prohibited and high-risk systems precisely because the harm can be serious and final. The light-touch tier is explicitly fenced to reversible harm — employment, credit scoring — not diagnostics or autonomous vehicles. So the real thesis isn't "ex-ante bad, ex-post good." It's gate what can't be undone; instrument what can.

And one more honest caveat, because it cuts against the whole idea: ex-post only works if the catching is real. GDPR's own after-the-fact enforcement was fragmented and underpowered. If Europe shifts the burden to "after" and then under-funds the after — 125 people, 2,000 mismatched authorities, and still no AI-specific liability framework — it hasn't rebalanced anything. It has deregulated and called it monitoring. Mariniello has a name for the global version of that slide: "mutually assured deregulation." The liability piece is crucial: shifting the burden of proof off the victim, with a rebuttable presumption of defect for ordinary systems, so a wronged user isn't forced to reverse-engineer a model they can't see. The EU Product Liability Directive only becomes applicable on 9 December 2026. Until funded detection and real liability are both in place, "ex-post" is just a nicer word for "nobody's watching."

What I'd build into my AI stack no matter which way Brussels goes

The quietly useful thing about Mariniello's ex-post toolkit is that most of it is just good engineering you should own regardless of the law. If I were standing up — or auditing — an AI product this quarter, this is the work, and none of it waits on a regulation:

Map the liability chain before you ship, not after the lawsuit. For every AI feature, write down who is on the hook when it harms a user: you, the model provider, the data source. Mariniello's wrongly-denied-mortgage example — bank, developer, LLM provider, training-data provider, all pointing at each other — is your incident RACI. If that cell is blank, you are the default defendant.
Keep your own incident registry now. Adopt the OECD incident taxonomy and log every AI failure against it today. Don't wait for a mandatory EU registry to start learning from your own outages.
Open a blameless near-miss channel. Aviation's reporting system works because reporting is safe. Make it safe for your engineers to flag the model doing something strange before it graduates into a postmortem.
Sample your own traffic. Put observability on model inputs and outputs the way you'd monitor any production service — drift, anomalous responses, the use cases you never designed for. This is the Sentinel idea at company scale, and it's how you find the high-risk use before a regulator or a journalist does.
Tier your own scrutiny by irreversibility, not by org chart. Concentrate human review where a wrong answer can't be taken back; let the reversible work move fast behind monitoring. That's the brief's entire thesis, applied one floor down — and it's just good prioritization.

Regulate AI like a system that runs, not a product that ships

The right balance was never "more regulation" or "less." It's regulation aimed at the moment where the risk actually lives. The AI Act spends its rigor on deployment day — the single day we know least about how the system will behave — and bills it to everyone equally, which lands hardest on the smallest. Mariniello's correction is to move that rigor where engineers already keep theirs: around the running system, scaled to reach, proportional to what can't be undone, and paid for mostly by whoever is large enough to bear it. That's not deregulation. It's the operational maturity the rest of us were forced into years ago — the night we learned you can't prove a system safe, you can only watch it closely and be ready to act.

The brief argues the EU shouldn't wait until 2031 to amend the Act. I'd go one step further: don't wait for Brussels at all. The observability and incident discipline that would make ex-post regulation actually work is the same discipline that makes your product survive contact with real users. If you're still mapping where the AI Act lands on your own stack, start with how Brussels reclassified an ordinary hiring tool as high-risk — and then go build the watching, because the regulator is eventually going to ask whether you did.

The EU Wants AI Proven Safe Before It Ships. That's Not How AI Fails.

You can't certify a system whose behavior is discovered after it ships

We already walked out of this movie, it's called waterfall

The up-front bill quietly hands the market to the incumbents

"Ship and watch" is also a great way to let the harm happen first

What I'd build into my AI stack no matter which way Brussels goes

Regulate AI like a system that runs, not a product that ships

Related Articles

(2/3) The AI Act's Bill Lands on the Employer, Not the Vendor

EU AI Legislation Is Live. Can You Answer These Five Questions?

(1/3) Brussels Just Reclassified Your Hiring Stack as High-Risk

Ready to build your engineering team?