Tool-layer authorisation for legal AI — read everything, sign nothing

A legal AI agent that can read every document in a firm's matter management system is useful. A legal AI agent that can also file motions, sign engagement letters, or send client correspondence on its own is a liability with a search bar. The gap between those two states is not a matter of model capability or prompt discipline. It is an authorisation problem, and it belongs at the tool layer — not in the prompt, not in the system message, and certainly not in the model's good intentions.

I have been building agentic systems long enough to know that the interesting work in AI safety for regulated industries is not in the model. It is in the boring scaffolding around it: which functions can be called, by whom, under what conditions, with what evidence of human consent. Get that right and you can hand an agent extraordinary read-side power without losing a night of sleep. Get it wrong and one hallucinated filename can become a filed pleading.

The asymmetry that matters: read versus write

Every action an AI agent can take falls on one side of a line. On one side: read, summarise, search, compare, draft, suggest, score, route. On the other: file, sign, send, pay, publish, transmit, commit. The first set is reversible or inert. The second set creates obligations, moves money, or speaks on behalf of a person or a firm.

Legal AI vendors and in-house teams keep getting this asymmetry wrong in two directions. Some teams are so spooked by the write side that they neuter the read side as well, which is how you end up with a chatbot that cannot find a clause in a contract you uploaded thirty seconds ago. Other teams, eager to demo end-to-end "agentic" workflows, wire up writes too early and discover that "human in the loop" was a slide, not an architecture.

The principle is simple. Reads are cheap and recoverable. Writes are expensive and, in legal work, sometimes irrevocable. The authorisation surface should reflect that asymmetry directly.

Why the prompt is not the place for this

"Do not file the motion without asking me first" is not a security control. It is a wish. Models drift, prompts get truncated, context windows churn, and the same instruction that holds for the first nine turns of a conversation can quietly evaporate on the tenth. Anyone who has watched an agent confidently invent a tool call it did not actually have permission to make has seen the failure mode.

The same applies to system messages, role instructions, and the various "guardrail" libraries that bolt onto the model output. They are useful for shaping behaviour. They are not where you put a control that, if breached, costs a client a missed deadline or a regulator a phone call.

The control belongs underneath the model, at the layer the model cannot see and cannot lie about: the tool boundary itself. If the function to file with a court registry simply does not exist in the agent's tool list — or exists but refuses to execute without a verified human signature — then no amount of prompt confusion can produce a filing.

Tool-layer authorisation, in plain terms

Tool layer security means that every action the agent can take is exposed as an explicit, named function with its own permissions, its own preconditions, and its own audit trail. The model does not have ambient access to your systems. It has a list of tools. Each tool decides, on each invocation, whether the caller is allowed to do this specific thing, on this specific record, on behalf of this specific user, right now.

For a legal AI workspace, that decomposition tends to produce roughly three tiers of tool:

Read tools — search the matter, fetch a document, list versions, summarise correspondence. These run with the user's read scope. They log who asked and what came back, but they do not require step-up approval.
Draft tools — produce a redline, generate a memo, propose a response. The output is written somewhere the human can see it, but it is parked in a draft state. Nothing leaves the building.
Commit tools — file, sign, send, pay, publish. These are the dangerous ones. They require a fresh, explicit, human-attested action that the system can prove happened.

The agent can plan freely across all three tiers. It can reason about what it would file, draft what it would send, and explain what it would sign. It just cannot pull the trigger on the third tier without a human pulling it with them.

Policy as code: the pattern that enforces it

Once you accept that the tool boundary is the right place for AI authorisation, the question becomes how to express the rules without scattering if statements through every tool implementation. The answer is policy as code AI teams are slowly converging on: declarative authorisation, evaluated by a dedicated engine, called the same way from every tool.

The shape is consistent regardless of which engine you reach for. You define policies as data — usually in a small, purpose-built language — and you ask the engine a question on every tool call: can this principal perform this action on this resource, given this context? The engine returns allow or deny, plus the reasons, and the tool either proceeds or refuses.

For legal AI, a usable policy set tends to look like this:

Identity and scope. Who is the human behind the agent session? Which matters are they on? Which clients have they been conflict-checked against? The agent inherits no more than the human's own access.
Action class. Is this a read, a draft, or a commit? Commit actions require a separate, recent, signed assertion — a fresh approval token, not a session cookie from an hour ago.
Resource sensitivity. A privileged communication, a sealed filing, a settlement figure under embargo — these get tagged at ingest and the policy refuses to surface them through tools that were not designed for that sensitivity tier.
Counterparty rules. No outbound communication tool can address an external party without a human-confirmed recipient list. The agent can propose recipients. It cannot finalise them.
Jurisdictional rules. Filings, signatures, and disclosures sit under jurisdiction-specific constraints. The policy engine, not the model, holds the table of what each jurisdiction requires.

The reason this pattern is worth the effort is that it makes the rules auditable. A regulator, a partner, or an insurer can read the policy file. They cannot read a model's weights. When something goes wrong — and at scale, something always goes wrong — you can point to the exact rule that fired or failed to fire, and you can change it in one place rather than across a dozen prompt templates.

Human in the loop, designed properly

"Human in the loop" has become the kind of phrase that means everything and nothing. In a tool-layer model it has a precise meaning: a commit tool will not execute until a human has reviewed the exact payload the tool is about to send, in the form it will be sent, and has signed an approval that the tool can verify cryptographically.

That last part is what most implementations skip. The human approval has to be over the actual bytes that will leave the system, not over a summary the agent wrote of what it is about to do. Otherwise you get the classic gap: the lawyer approves "send the standard NDA to Acme" and the agent, having reasoned creatively, sends a non-standard NDA to Acme Holdings instead of Acme Ltd. The policy engine should refuse any commit whose payload does not match the payload that was approved.

Designed properly, this is not friction. The human spends their attention on the small set of high-stakes moments where attention is the entire point of their job. The agent does the rest of the work — the reading, the cross-referencing, the drafting, the chasing — without ever being in a position to act unilaterally.

What this looks like in practice

A few habits tend to separate teams who get agent permissions right from those who do not.

Tools are nouns and verbs, not capabilities. file_motion is a tool. "Access to the court system" is not. The narrower the tool, the easier the policy.
Drafts are first-class. Every commit tool has a draft sibling. The draft sibling is the only path the agent can take on its own.
The policy engine is out of process. If the same code that calls the tool also evaluates whether the call is allowed, you do not have authorisation. You have a comment.
Every refusal is logged with reasons. Denials are not failures. They are the signal that the system worked. Treat them as data.
The agent is told, in plain language, what it cannot do. Not because the prompt enforces it, but because an agent that knows the shape of its limits plans better within them.

What we are doing about it

At IMPT we are building an AI-native booking agent that touches payments, inventory, and on-chain offset commitments — three categories where a confused write is a real problem. The same tool-layer pattern I have described here is what sits underneath it: read tools the agent uses freely, draft tools that produce reviewable proposals, and commit tools that refuse to run without a verified human action over the exact payload. If you are designing legal AI, or any agent that touches regulated work, the thing to do this week is small and concrete. List every action your agent can take. Sort the list into read, draft, and commit. For every commit, write down — in one sentence — what human signal authorises it, and where in your code that signal is verified. If the answer is "the prompt," you have your week's work.