Most workshops send you home with a notebook full of slides and a vague intention to "do something with AI". The Clonmel workshop is built differently. By the time you leave Annerpark House you have artefacts on your laptop — real ones, things you can run, edit, and show your board on Monday morning. This article walks through exactly what those artefacts are, why each one matters, and how they fit together into something you can actually deploy in a regulated environment.
A working private model on your own hardware
The first deliverable is the one people are most surprised by: you walk out with a local language model running on a machine you control. Not an API key. Not a trial account. A model file, a runtime, and a configuration that boots without internet access.
For most attendees this means a quantised open-weights model — typically in the 7B to 14B parameter range depending on the hardware in front of us — running through a local inference server. We pick the weight class to match what your firm can realistically host: a partner's workstation, a small server in a comms room, or a dedicated box if you've already specified one. The point is not benchmark scores. The point is that the thing runs in your building, on your power, with your data, and nobody outside the room can see what you ask it.
The configuration we hand over is not a black box. You get the model file, the runtime binary or container, a startup script, and a short text file documenting context window, quantisation level, and the prompt template the model expects. If you want to swap the model later — and you will, because the field moves — you have everything you need to do it without calling me.
A retrieval index built from your own documents
A model on its own is a clever toy. A model wired into your documents is a tool. The second artefact is a working retrieval-augmented generation pipeline pointed at a representative slice of your firm's material — engagement letters, policy documents, prior advice, internal handbooks, whatever you brought with you on the day.
The technical shape of this is straightforward. Documents are chunked, embedded with a local embedding model, and stored in a vector index that lives on disk. When you ask a question, the system retrieves the top matching chunks, hands them to the model along with your query, and the model answers grounded in your text. We use sentence-window retrieval for most professional services material because the surrounding context matters as much as the matched sentence.
What you take home is the index itself, the ingestion script that built it, and a small chunking configuration file. You can re-run the script against a bigger document set the following week without me. The script is deliberately readable — somewhere between fifty and two hundred lines depending on your file types — because I want your IT person to be able to maintain it.
A prompt and policy pack written for your sector
The third artefact is the one that takes the longest in the room and saves you the most time afterwards. We sit down and write the system prompts, refusal rules, and citation requirements for your specific use cases.
This sounds like a soft deliverable. It isn't. A system prompt for a legal firm needs to encode that the model must cite the source document for every factual claim, must refuse to give legal advice it cannot ground, and must flag when a question falls outside the scope of the indexed material. A system prompt for an accounting practice needs different rules — about figures, about period assumptions, about the difference between management accounts and statutory ones. These rules are testable. We test them in the room.
You leave with a prompt pack as a set of plain text or YAML files, version-controlled in a small Git repository on your machine. Each prompt has a name, a description of what it is for, the actual prompt text, and a short list of test questions that should produce expected behaviour. When you change a prompt later, you re-run the test questions and see whether you broke anything. This is engineering hygiene applied to AI, and it is not optional.
An evaluation harness you can re-run
The fourth artefact is the one that separates serious deployments from theatre. You get an evaluation harness — a small script and a CSV of test questions with expected answers or expected behaviours — that you can run against your stack any time you change something.
The harness covers three categories. Factual recall: questions where the answer is in your documents and the model should find it. Refusal: questions the model should decline because the answer is not in scope or not in your data. Reasoning: questions that require combining two or three retrieved passages to answer correctly. Each category has somewhere between ten and fifty questions seeded during the workshop, and the harness reports pass rates per category.
Why does this matter? Because the day will come when you want to upgrade the model, expand the document index, or change a prompt. Without an evaluation harness you are guessing whether the change made things better or worse. With one, you have an answer in the time it takes to make a coffee. You can read more about how this fits into the broader Clonmel workshop programme if you want the schedule and prerequisites.
A deployment note and a threat model
The fifth artefact is written, not code. It is a short document — usually six to ten pages — that records the deployment decisions we made for your firm and the threats we considered. This is the document you hand to your compliance officer, your insurer, or your regulator when they ask what you did and why.
The deployment note covers where the model runs, who has access, how documents enter and leave the index, what is logged, what is not logged, and what happens when the box fails. The threat model is deliberately concrete: prompt injection through ingested documents, exfiltration via crafted queries, model hallucination presented as fact, stale data after a document is superseded, and the boring but important operational risks like backup, patching, and what you do when the person who knows the system leaves the firm.
I write this document with you, not for you. Half the value is the conversation that produces it. By the end your senior people understand the system well enough to defend it when challenged, which matters more than any technical artefact in the bundle.
A roadmap for the next ninety days
The last thing you walk out with is a written plan for what happens after the workshop. It is specific to your firm and it is honest about what is hard.
The plan typically has three phases. Weeks one to three: expand the document index from the workshop sample to your full corpus, re-run the evaluation harness, and identify the prompts that need tuning at scale. Weeks four to eight: pilot with two or three named users on real work, capture their corrections, and feed those corrections back into the prompts and the retrieval configuration. Weeks nine to twelve: widen access, add the second use case, and decide whether you need more hardware or a bigger model.
The roadmap names the people responsible for each phase. If those names are not in the document, the document is worthless. I push hard on this in the room because AI projects fail more often from unclear ownership than from technical problems. Most of the architectural decisions behind this approach are described in more depth on the main Intelligence Brain page if you want to read them before you come down to Clonmel.
Where to start this week
If you are thinking about the workshop, the most useful thing you can do this week is gather a representative sample of the documents your firm relies on — fifty to two hundred files is plenty — and write down the three questions you most wish you could ask them. Bring those to the day. Everything I have described above gets built around your real material and your real questions, not generic examples. The artefacts are only useful if they are yours from the first hour.