Intelligence Brain · legal

Data residency for Irish legal AI — the questions to ask

← Back to Intelligence Brain

Every Irish solicitor I've spoken to in the last six months has the same question buried somewhere in the conversation: "Where does the data actually go?" They're right to ask. The honest answer for most cloud-based legal AI tools is some combination of "it depends", "we can't tell you exactly", and "trust us". That isn't good enough when you're handling a client's separation papers, a commercial M&A draft, or a Garda interview transcript. This article is the list of questions I'd be asking any vendor — including me — before letting a model anywhere near privileged material.

Why data residency is a harder question than it looks

Data residency sounds like a single yes/no — is the data in Ireland or not. In practice it's at least four overlapping questions: where the data is stored at rest, where it's processed (which is often a different machine, sometimes a different jurisdiction), where the model weights live, and where the logs, telemetry, and embeddings end up. A vendor can truthfully say "your documents are stored in Dublin" while the inference call that reads them runs in Virginia and the prompt logs sit in a US observability platform for ninety days.

The GDPR doesn't actually forbid international transfers. It requires a lawful basis, appropriate safeguards, and — post-Schrems II — a transfer impact assessment that takes account of the destination country's surveillance laws. That's the part most legal AI vendors gloss over. The Data Protection Commission has been clear that Standard Contractual Clauses on their own aren't a magic wand if the recipient country can compel disclosure. For a US-headquartered SaaS vendor processing Irish legal data, that compulsion risk is non-trivial under FISA 702 and the CLOUD Act, regardless of where the storage bucket happens to live.

The seven questions to ask a legal AI vendor

Print this list. Email it to anyone trying to sell you a legal AI product. If they can't answer all seven in writing, they're not ready for regulated work.

  • Where are documents stored at rest, and in which legal entity's name? "EU region" isn't enough — you want a country, a data centre operator, and the contracting entity. An Irish-incorporated subsidiary of a US parent is still subject to US extraterritorial reach.
  • Where does inference run? The GPU that actually executes the model. Not the API gateway, not the storage — the compute. If they're using OpenAI, Anthropic, or Google as a backend, name the region and the contractual terms covering it.
  • Are prompts and completions logged, and where? Most LLM providers log inputs and outputs by default for abuse monitoring. Some let you turn it off contractually (Zero Data Retention agreements). Get that in writing, with the retention period.
  • Is the data used for training? Even one round of fine-tuning on client documents is a disclosure event. The answer should be an unambiguous no, in the contract, with audit rights.
  • Who at the vendor can read the data? Support staff? ML engineers debugging an issue? What's the access control, and is access logged in a way you can subpoena later?
  • What happens to embeddings and vector indexes? These are derived data but they can leak content. They count as personal data under GDPR if they relate to identifiable individuals. Where do they live, and for how long?
  • What's the sub-processor list, and how are you notified of changes? A legal AI product typically has five to fifteen sub-processors — auth, storage, compute, observability, email, analytics. Each one is a potential exfiltration point.

Storage in Ireland is the easy part — processing is where it leaks

I want to dwell on inference because this is where most vendors are quietly misleading. A typical "EU-hosted" legal AI stack looks like this: a Next.js app in an EU region, a Postgres database in Frankfurt or Dublin, document storage in S3 eu-west-1. So far so good. Then the user asks a question. The application calls an LLM API. That call is, ninety percent of the time, going to a US-based foundation model — sometimes routed via an "EU endpoint" that is really a proxy, sometimes routed direct.

If the model itself runs on US infrastructure, the prompt — which contains your client's document content — has been transferred to the US for processing. That's a transfer under GDPR Article 44, and you need a lawful basis and a transfer impact assessment for it. The fact that the response comes back and gets stored in Dublin doesn't undo the transfer.

The technical fix is to run inference inside the EU, ideally inside Ireland, on infrastructure operated by an entity not subject to US extraterritorial law. That's harder than it sounds because the best frontier models are American and their EU deployments are typically operated by their US parent. The pragmatic options are: (a) a genuinely EU-operated model deployment with contractual protections, (b) a self-hosted open-weight model on EU compute, or (c) on-premise — the model runs on hardware physically inside your office or your firm's data centre.

The on-premise argument, and where it actually applies

I'm biased here because I build on-premise systems, so take this with appropriate salt. On-premise solves the residency question by removing it. If the model weights, the inference compute, the document storage, and the logs are all on a box in your server room, there is no transfer to assess. The DPC's guidance becomes much simpler. The Schrems II analysis becomes a one-liner.

The honest counter-argument is that on-premise is more capital-intensive up front, requires a competent IT partner to maintain, and the open-weight models you can run locally are generally a step behind the frontier ones. For a fifty-solicitor commercial firm doing complex M&A drafting, that gap might matter. For a six-solicitor general practice doing wills, conveyancing, and family law, it almost certainly doesn't — the models that fit on a single workstation GPU now are more than capable for those tasks.

The other thing on-premise gives you that cloud doesn't is verifiable data residency. You can walk into the room. You can pull the network cable and the system still works. That's a different category of assurance than a contractual promise from a vendor. If you want to see how I've structured this for Irish practices specifically, I've written more about it on the Intelligence Brain for legal page.

What the Law Society and the DPC actually expect

The Law Society's guidance on cloud computing, which predates the current AI wave but applies to it directly, requires solicitors to satisfy themselves that confidentiality is preserved, that they retain control of the data, and that they can comply with their obligations to clients and the courts. That last point is interesting — it includes being able to produce documents on demand and being able to demonstrate that privilege has not been waived.

If a US vendor receives a CLOUD Act subpoena and hands over your client's documents without telling you, you've potentially waived privilege without knowing it. You also can't comply with a discovery request properly because you don't know what's been disclosed. That's not a hypothetical — it's the actual mechanism that makes US-hosted legal AI a problem for regulated Irish practice.

The DPC's expectations, layered on top, are that you've done a Data Protection Impact Assessment for any high-risk processing (and AI on legal documents qualifies), that you've documented your transfer impact assessments, and that your sub-processor chain is mapped. The DPC has been escalating enforcement on exactly this kind of documentation gap.

A practical residency checklist for the next vendor meeting

Before the demo, send the seven questions above. Score the answers. If "where does inference run" gets a vague reply, end the call. If "is the data used for training" gets anything other than a contractual no, end the call. If the sub-processor list isn't published, end the call.

During the demo, ask to see the architecture diagram with country flags on every box. Storage, compute, inference, logs, backups, observability, auth. If they don't have one, they haven't thought about it. Ask where the encryption keys are held — if the vendor holds them, they can read your data, full stop, regardless of "encryption at rest" marketing copy. Customer-managed keys in an Irish HSM is the answer you want.

After the demo, ask for the DPA, the sub-processor list, the transfer impact assessment, and the SOC 2 or ISO 27001 report. A serious vendor has these ready. A vendor who needs three weeks to "put something together" is telling you something important.

Where to start this week

Pull up your current AI tools — including the ones individual solicitors are using on their own laptops, because that's where the real exposure is — and answer the seven questions for each one. You'll find at least one tool you can't answer for. That's your immediate risk. Decide whether to migrate it, restrict its use to non-client data, or replace it. If you want to see how the residency question is solved by removing it entirely, the Intelligence Brain overview walks through the on-premise architecture. Either way, the worst position is the one most Irish firms are in right now: using cloud legal AI without being able to answer where the data is. That's the position you want to be out of by month-end.

Book a 30-minute assessment

Direct with Michael. No charge. No pitch deck.

Pick a slot →