Challenges

Building a Compliant AI Legal Engine: Multi-Model Routing, Legal RAG, and the EU AI Act in Practice

By Marc Molas·January 15, 2026·10 min read

Most AI products are built by choosing a model, writing some prompts, and shipping. That works for a chatbot. It doesn't work when the output carries legal weight, when the data is regulated, and when a wrong answer isn't just unhelpful — it's potentially harmful.

When we built the AI engine behind Bonus Iuri — a contract analysis platform that reviews Spanish legal documents against real legislation — every architectural decision had to balance three competing demands: reasoning quality, regulatory compliance, and cost sustainability at scale.

This post walks through the thinking behind the key decisions. Not a blueprint you can copy — but the principles that guided us through a domain where getting it wrong has real consequences.

The Core Problem: Legal AI That Doesn't Hallucinate

The fundamental challenge in legal AI isn't generating text that sounds legal. Any large language model can produce confident-sounding legal analysis. The challenge is producing analysis that is correct — that cites real articles of real laws, that identifies genuine risks based on established legal doctrine, and that clearly distinguishes between what the contract says and what the law requires.

Hallucinated legal references are not a minor inconvenience. A user who relies on a fabricated citation to article 47 of a law that only has 35 articles has been actively harmed by the product. This isn't an edge case to mitigate — it's the central problem to solve.

Our approach rested on three architectural pillars: retrieval-augmented generation designed specifically for legal text, a strict citation enforcement policy, and intelligent model routing that matches reasoning depth to task requirements.

Pillar 1: Standard RAG Fails for Legislation

Standard RAG implementations chunk documents into fixed-size text blocks — 512 tokens, 1,000 characters, whatever the default is — and retrieve the most similar chunks to the query. This works for general knowledge bases. It fails for legislation.

Legal documents have a rigid internal structure: articles, sections, subsections, transitional provisions, recitals. A fixed-size chunk that splits an article about rental deposits across two chunks loses the semantic coherence that makes the article meaningful. Worse, it can produce retrievals that combine the end of one article with the beginning of another, creating a chimeric reference that looks valid but isn't.

The principle: chunk at legal boundaries, not arbitrary token counts.

We built a section-aware chunking pipeline that parses legislative structure before splitting. The system detects article, section, chapter, and provision boundaries. Each chunk maps to a complete legal unit — typically one article with its subsections, or one coherent section of a chapter.

The system covers seven consolidated Spanish legislations sourced from the BOE (Boletín Oficial del Estado): the Código Civil, Estatuto de los Trabajadores, Ley de Arrendamientos Urbanos, corporate law, commercial law, insolvency, and administrative procedure. Each is chunked at structural boundaries, vectorized, and deduplicated to prevent stale entries from accumulating.

Why freshness matters: Spanish legislation isn't static. Amendments and corrections appear regularly. A system citing an outdated article version — one that was amended months ago — produces analysis that is technically incorrect. Keeping the legislation index current is an operational cost most prototypes ignore. In production, it's the difference between a reliable tool and a liability.

Pillar 2: Citation Enforcement — "No Source, No Claim"

Even with legislation-aware RAG, an LLM can still generate plausible-sounding legal analysis that doesn't correspond to any retrieved source. The model might interpolate between two real articles, or recall training-data patterns that don't apply to Spanish law.

We enforced a strict rule: every legal assertion in the output must be traceable to a specific retrieved passage. If the system cannot ground a claim in an actual legislative text, the claim is not made.

The analysis pipeline validates citations at generation time. Each legal assertion is checked against the retrieved context: does the cited passage actually exist? Does the source document match? Is the relevance strong enough to support the claim? Assertions that fail validation are flagged rather than silently included.

The result is a transparency chain: the user can trace any legal claim back to a specific article of a specific law. That traceability is what separates useful legal AI from dangerous legal AI — and it's what gives Bonus Iuri the credibility to serve legal professionals, not just curious consumers.

Pillar 3: Model Routing Is a Product Decision, Not Just a Cost Lever

Not all tasks in a legal analysis require the same reasoning depth. Routing everything through the most powerful (and expensive) model is wasteful. Routing everything through the cheapest model produces unacceptable quality on complex reasoning tasks.

We built a routing layer that selects the appropriate model per task type, balancing reasoning quality, latency, and cost:

Quick risk detection — the initial traffic-light score that tells a user whether their contract has issues worth investigating — uses a fast, lightweight model. Sub-second response, near-zero marginal cost.
Full legal analysis — the detailed checklist with reasoning, citations, and risk matrix — routes to a model with stronger multi-step reasoning capabilities.
Complex multi-law scenarios — contracts spanning multiple legal domains — use models optimized for chain-of-thought cross-referencing.

Why this matters economically: A freemium legal AI platform lives or dies on unit economics. If every free analysis is expensive, scaling the free tier becomes unsustainable. Intelligent routing keeps the free tier viable while reserving deeper reasoning for paying users. It's not just cost optimization — it's a product design decision that shapes the user experience at every tier.

Compliance as Architecture, Not Checklist

In regulated AI products, compliance is often treated as a final review step: build the product, then check the boxes. This approach fails because it produces architectures that are expensive to retrofit and compliance documentation that doesn't reflect actual system behavior.

For Bonus Iuri, compliance requirements shaped the architecture from day one:

GDPR data minimization drove the storage model. User documents are processed with minimal persistence. When storage is necessary, each user's data is structurally isolated — not just through access controls, but through the storage architecture itself. No cross-user data access is possible at the infrastructure level.

Right to erasure drove the data lifecycle. Account deletion triggers a complete cascade: documents, derived embeddings, and analysis records are permanently removed. Not a soft delete with eventual cleanup — immediate and irreversible.

EU AI Act transparency drove the output format. Every analysis includes clear disclosure of the AI systems involved, their limitations, and guarantees about data handling. This isn't a footer link to a general policy — it's an in-context disclosure attached to the output the user is reading.

CCBE ethics drove the product positioning. The platform is explicitly a tool for legal analysis, not a replacement for legal counsel. Disclaimers are embedded in the user flow, not buried in terms of service.

The investment: roughly one week of a six-week project. That's significant on a tight timeline. But retrofitting compliance into a non-compliant architecture would have cost two to three times as much and produced a weaker result.

Domain Pipelines Over Generic Prompts

The simplest approach to contract analysis is a single prompt: "Analyze this contract and identify risks." That approach produces generic, surface-level analysis — the AI equivalent of a law student's first reading.

We built specialized analysis pipelines for each contract type. Each includes:

Type-specific legislation mapping. An employment contract analysis references labor law. A rental analysis references tenancy law. The system retrieves from the relevant legal framework, not the entire corpus.
Domain-specific evaluation criteria. Each contract type has structured evaluation points derived from what a practicing Spanish lawyer would check — specific legal requirements with specific statutory references, not generic "check for risk" instructions.
Calibrated risk scoring. What constitutes "high risk" differs by contract type. A missing compensation clause in an employment contract is a legal violation. A missing SLA in a services contract is a negotiation concern. The scoring reflects these distinctions.

The quality difference is the gap between "this contract has some potential issues" and "clause 7.3 sets a trial period of 9 months, which exceeds the statutory maximum for qualified workers under the relevant article of the Estatuto de los Trabajadores."

You can see this level of specificity in action at bonusiuri.pro.

What This Means for Other Regulated Domains

The principles behind Bonus Iuri's AI engine aren't specific to legal tech. They apply to any AI product in a regulated domain:

Structure-aware retrieval — don't chunk domain documents arbitrarily. Understand their internal structure and preserve it.
Citation enforcement — if the AI can't ground a claim, it shouldn't make it. Traceability isn't optional in high-stakes domains.
Intelligent routing — match model capability to task requirements. Not every query needs your most expensive model.
Compliance-first architecture — build regulatory requirements into the data model and infrastructure, not into a review checklist.
Domain specialization — generic prompts produce generic results. Invest in domain-specific pipelines.

These aren't theoretical recommendations. They're the principles we applied to ship a production legal AI platform in six weeks — and they're directly transferable to healthcare, finance, insurance, and other domains where AI output has real consequences.

Building an AI product in a regulated domain? Talk to a CTO about how compliance-first architecture can compress your timeline without cutting corners.