Most AI rollouts I see in Irish firms fail in the same place: someone buys a licence, a few people use it for email drafting, and six months later nobody can point to a single piece of institutional knowledge that has actually moved into the system. The work is being done — but the intelligence is leaking. Intelligence injection is the pattern I built to stop that leak. It's a four-stage methodology I've refined across rollouts in legal, accounting, property and public-sector contexts, and it's the operational spine behind everything I deploy under the Intelligence Brain product line. This article walks through the four stages in detail, with the engineering decisions behind each one.
Why "injection" and not "training"
I deliberately don't call this training. Training implies the model is the thing being changed. In a regulated firm, you almost never want to fine-tune a base model on your data — the cost is wrong, the audit trail is wrong, and the moment the underlying model updates, your investment evaporates. Intelligence injection treats the model as a fixed reasoning engine and treats your firm's intelligence — its precedents, its house style, its risk appetite, its decision history — as the variable that gets injected at inference time.
That distinction matters because it changes what you build. You're not building a training pipeline. You're building a retrieval layer, a context-assembly layer, a policy layer, and a feedback loop. Those four things, in that order, are the four stages.
Stage one — capture: turning tacit knowledge into addressable units
The first stage is the one most firms underestimate. Capture is not "point the system at the shared drive". A shared drive contains documents; it does not contain intelligence. Intelligence is the reasoning that produced those documents — why this clause was used here and not the standard one, why this file was settled rather than litigated, why this loan was declined.
In practice, capture splits into three streams:
- Document corpus: the existing artefacts — contracts, file notes, valuations, reports. These get chunked, embedded, and indexed. Standard retrieval-augmented generation territory, but with one important constraint in Irish firms: the chunking strategy has to respect document structure (clauses, schedules, appendices) because lawyers and accountants reason at that granularity, not at the 512-token level.
- Decision log: a lightweight capture mechanism — usually a structured form or a Slack/Teams bot — that prompts the practitioner to record the reasoning behind a non-obvious decision. Two or three sentences. This is the layer that turns a firm's tacit expertise into something the system can later cite.
- House rules: the explicit policies, style guides, risk thresholds and approval matrices. These tend to live in people's heads or in PDFs nobody reads. They need to become structured, versioned, and addressable — typically as a set of YAML or JSON policy documents that the orchestration layer can load deterministically.
The engineering rule I follow: every captured unit needs a stable identifier, a source, a timestamp, and a confidence or authority level. Without those four metadata fields you cannot do stage four properly, and you cannot pass an audit.
Stage two — context assembly: the prompt is the product
Stage two is where most of the actual cleverness lives, and where the difference between a hobbyist deployment and a production one becomes obvious. Context assembly is the runtime process of taking a user's query, deciding which captured units are relevant, ranking them, and assembling them into a prompt that the reasoning model can act on.
A naive RAG pipeline does one similarity search and stuffs the top five chunks into the prompt. That is not enough for a regulated firm. A proper context-assembly layer does several things in sequence:
- Query classification: is this a drafting task, a research task, a review task, or a decision-support task? Each one wants a different context shape.
- Multi-source retrieval: hit the document index, the decision log, and the policy store independently. The policy store is queried by rule, not by similarity — you don't want the model to "miss" a mandatory policy because the wording wasn't close enough.
- Authority ranking: a partner's signed-off precedent outranks an associate's draft. A current policy outranks a superseded one. This ranking is explicit, not learned.
- Token budgeting: with finite context windows, you have to make trade-offs. I usually reserve a fixed portion for policy (non-negotiable), a portion for high-authority precedent, and the remainder for similarity-matched material.
The output of stage two is a fully assembled prompt with every included unit traceable back to its source. That traceability is what makes stage three possible.
Stage three — guarded inference: the policy layer
Stage three is where the reasoning happens, but it's wrapped in two guards: a pre-inference guard and a post-inference guard.
The pre-inference guard checks the assembled context against hard rules. Is the user authorised to ask this? Does the query touch a category of data that requires a specific disclaimer or workflow? Is there a conflict-of-interest flag on any of the retrieved material? In legal and accounting contexts these checks are not optional — they're how you stay compliant with professional conduct rules. I treat them as deterministic code, not as instructions to the model. You do not ask an LLM to enforce a conflict check; you enforce it before the LLM ever sees the data.
The post-inference guard examines the model's output before it reaches the user. Common checks: did the response cite sources where it was supposed to? Did it stay inside the firm's house style? Did it produce a number that doesn't appear in any retrieved document (a hallucination signal)? Did it make a recommendation in a category where the firm's policy is to defer to a human?
The guards aren't there to make the system safe in some abstract sense. They're there because in a professional services firm, the cost of one bad output reaching a client is higher than the cost of a hundred outputs being held for review. Asymmetric risk demands asymmetric controls. This is the engineering substance behind the methodology I use across deployments — and it's the part most off-the-shelf tools simply don't do.
Stage four — feedback: closing the loop without poisoning the well
Stage four is the one that separates a system that gets better over time from one that quietly degrades. Every output the system produces, and every action a user takes on that output — accept, edit, reject, escalate — is a signal. The question is what to do with it.
The temptation is to feed every accepted output back into the corpus. Don't. That's how you get model collapse at the firm level: the system starts citing its own previous outputs, errors compound, and the institutional knowledge slowly shifts from "what the firm actually knows" to "what the system has previously said". I've seen this happen in pilots and it's hard to reverse.
The pattern I use instead has three components:
- Edits become signals, not sources. When a user edits an output, the diff is logged and used to tune retrieval and ranking — not added to the corpus as a new document.
- Acceptances are weak signals. They tell you the output was good enough, not that it was correct. They adjust ranking weights but don't promote material to higher authority.
- Promotions are explicit. When a piece of output should become firm intelligence — a new precedent, a new policy clarification — a human promotes it through a defined workflow. It gets a source, an author, an authority level, and enters the corpus the same way any captured artefact does.
This is slower than full automation, deliberately. In a regulated firm, the speed of the feedback loop is bounded by professional accountability, not by engineering capacity.
How the four stages fit together in an Irish rollout
One observation from doing this work in Ireland specifically: the four stages map cleanly onto the regulatory and cultural realities of Irish professional services firms. Capture respects the fact that Irish firms keep more in partners' heads than they admit. Context assembly respects the fact that policy and precedent carry different weights in our professional bodies. Guarded inference respects the conduct rules of the Law Society, Chartered Accountants Ireland, and the public-sector frameworks. Feedback respects the fact that no Irish firm I've worked with wants a system that learns autonomously from client matter.
The methodology is also what makes on-premise deployment tractable. Because no stage requires fine-tuning, the entire pipeline runs against a fixed model — local or hosted — with all the firm-specific intelligence sitting in your own infrastructure. That's the architecture behind the Intelligence Brain and it's the only architecture I've found that survives both a security review and a partner's scepticism.
Where to start this week
If you want to begin applying this pattern, don't start with the model and don't start with the tooling. Start with stage one, and start narrow. Pick one workflow in your firm — a single recurring task, ideally one a senior person currently does and complains about. Spend a week capturing the documents, the decisions, and the house rules that govern that one workflow. If you can't write down the policy in plain English, you're not ready for AI on it yet — and that's a finding worth having on its own. Once you have that one workflow captured cleanly, the other three stages have something real to operate on. Everything else follows from there.