Multi-agent governance for regulated industries — planner, worker, auditor, and the audit trail

A single large language model behind a chat box is a demo. A swarm of agents acting on customer data inside a regulated business is a different beast, and the rules that already bind banks, insurers, healthcare providers and travel platforms in Ireland and the EU do not soften because the actor is statistical. If you want agents to do real work in a regulated context — book a room, move money, write to a record system, talk to a grieving user — you need a shape for the swarm that a regulator, an auditor, and your own DPO can read on a Monday morning. The shape we keep coming back to at IMPT is the same one that quietly works in classical engineering: planner, worker, auditor, with an audit trail you would be happy to hand to the DPC.

The problem with one big model doing everything

The temptation is to wire a frontier model directly to your tools and let it think out loud. It works in a notebook. It does not survive contact with GDPR, the EU AI Act, DORA, NIS2, the Consumer Protection Code, or whatever sectoral overlay applies to your domain. The reasons are practical, not philosophical.

One model doing everything has no separation of duties. The same context window decides what to do, does it, and writes the note about having done it. If something goes wrong — a wrong booking, a wrong payout, a wrong piece of advice to a vulnerable user — you cannot tell whether the failure was in the plan, the act, or the record of the act. You cannot point at the artefact that should have caught it, because there isn't one. That is fine for a toy. It is not fine for anything a regulator can fine you for.

Multi-agent governance is, at its heart, the engineering response to that problem. Split the cognition into roles whose outputs are inspectable on their own, and make those roles answer to each other in a fixed order.

Planner, worker, auditor — the minimum viable swarm

The pattern we use, and the one I would defend in front of any compliance officer, has three roles. They are not the only roles you will end up with, but they are the floor.

The planner takes the user intent and the policy bundle and produces a plan. Not prose — a structured plan. Steps, tools, expected inputs and outputs, the policies it believes apply, and an explicit statement of what it will not do. The planner does not touch tools. It writes a document and signs it.

The worker takes that signed plan and executes the steps. It is allowed to call tools, but only the tools the plan named, with arguments inside the bounds the plan declared. If reality drifts — a hotel inventory call returns something unexpected, a payment endpoint times out — the worker does not improvise outside the plan. It returns to the planner for an amendment, and the amendment is itself signed and logged.

The auditor never executes. It reads. It checks the plan against policy before the worker is allowed to start, and it checks the worker's actions against the plan after each step. The auditor's job is to say stop. It has a veto, and the veto is itself an artefact that goes into the trail.

That is the spine. Planner proposes, worker acts, auditor assents. Three different prompts, three different model calls, often three different models with different temperature settings and different system instructions, all glued together by a deterministic orchestrator that you wrote and that you can put on a witness stand.

What a compliant AI architecture actually looks like

The agents are the interesting part. The boring part is what makes it legal.

Around the swarm you need a deterministic shell. A policy engine that loads the relevant rules — GDPR lawful basis, AI Act risk tier, sectoral rules, your own internal codes — and gives the planner and auditor a machine-readable view of them. A tool layer where every callable action is wrapped, rate-limited, and signed by the worker's identity. A memory layer that distinguishes ephemeral working memory from durable record. And a clock and a hash chain, because without ordered, tamper-evident events you have a story, not an audit trail.

None of this is exotic. We have been building this kind of plumbing for two decades — I spent most of mine doing it at Tesco, Dunnes Stores and Oracle before IMPT. What is new is that the thing inside the shell is non-deterministic. The shell has to assume the agent is a junior staff member who has read every book ever written, never sleeps, and occasionally hallucinates a confident answer. You design the controls accordingly.

The audit trail is the product

If you take one thing from this piece, take this: in a regulated industry, the AI audit trail is not a side effect of the agent doing its job. It is the job. The completed booking, the approved claim, the answer to the user — those are outputs. The trail is the artefact that makes the output defensible.

A trail that earns its keep has, at minimum:

The exact user intent as received, with the timestamp and the channel.
The policy bundle that was loaded, by version hash.
The planner's plan, with the model identifier, the prompt, and the full output.
The auditor's pre-flight assessment, including any conditions it attached.
Each tool call the worker made, with arguments, response, latency, and the worker's reasoning trace at the point of call.
The auditor's per-step assessments and any vetoes.
The final state, the user-visible output, and a hash linking back to the previous record so the chain is tamper-evident.

Two properties matter most. It must be append-only, so nothing can be quietly rewritten after the fact. And it must be replayable, so a year later you can pull a single transaction, reload the same policy version, and see exactly why the swarm did what it did. If you cannot replay, you cannot defend.

Where Irish and EU law actually lands on this

I am not a lawyer, and this is not legal advice. But the direction of travel is clear enough to plan against. The AI Act treats higher-risk uses with documentation, logging, human oversight and post-market monitoring obligations that map almost cleanly onto the planner-worker-auditor pattern. GDPR Article 22 and the surrounding case law mean that if your swarm makes a decision with legal or similarly significant effect on a person, you need a meaningful human-in-the-loop — and "meaningful" means the human can actually see what the agent did and why. DORA pushes operational resilience and traceability into financial services. NIS2 pulls similar obligations into a wider set of sectors. Sector codes — the Consumer Protection Code, MiFID, IDD, the various health and travel rules — sit on top.

None of these regimes were written with multi-agent systems in mind. All of them are perfectly happy to be satisfied by one. A swarm where the planner cites the policy, the auditor enforces it, the worker is bounded to declared tools, and the trail is hash-chained and replayable, is a swarm you can put in front of a regulated workload without flinching.

Failure modes we have already learnt to design against

A few patterns worth naming, because they will catch you otherwise.

The chatty auditor. If your auditor is also a model, and you let it negotiate with the planner, it will eventually be talked round. Auditors must have a constrained output surface — pass, fail, fail-with-condition — and they must not see the planner's persuasion. They see the plan and the policy. Nothing else.

Tool sprawl. Every tool you add to the worker's surface is a new way to fail. Keep the tool catalogue small, version it, and require the planner to declare tool intent before the auditor approves.

Memory leakage. Long-running agents accumulate context. Without discipline, personal data from one session leaks into another. Working memory must be scoped to the transaction and destroyed at its close. Durable memory must be deliberate, lawful, and minimised.

Silent retries. If the worker retries a failed tool call, that retry is part of the trail. Hidden retries are how you end up double-charging a customer and being unable to explain why.

What we are doing about it at IMPT

We are building IMPT's forthcoming AI-native booking agent on exactly this spine. A booking touches payments, personal data, supplier contracts, consumer law, and our own carbon commitment — every booking we take offsets a tonne of CO₂ on-chain, paid from our commission, and that promise has to be honoured by the agent the same way it is honoured by the website. Planner, worker, auditor; bounded tools; replayable trail. The same discipline carries into Bro AI, where the user is grieving and the cost of a bad answer is not measured in euros, and into the Ireland Quantum facility we have committed to deliver, where the audit obligations on sovereign compute will be heavier still. If you are standing up agents in a regulated business this week, do one thing: stop building one agent that does everything, and draw the line between the role that decides, the role that acts, and the role that says no. The trail you keep from that moment forward is the one you will be glad of.