Ireland Quantum 100 · Benchmarks

How we will publish benchmarks — quantum volume, CLOPS, application-specific

← Ireland Quantum overview

Why benchmarks matter, and why most quantum benchmarks today are not good enough

If you are spending public or private money on quantum compute, you need numbers you can actually compare. The problem is that the quantum benchmarking landscape is fragmented: vendors quote raw qubit counts, cherry-picked fidelities, or proprietary scoring rubrics that conveniently flatter their own hardware. A 100-qubit machine with median two-qubit gate error of 1% is a very different instrument from a 100-qubit machine at 0.3%, and neither number on its own tells you whether a useful chemistry circuit will actually finish before noise eats it.

Ireland Quantum 100 will publish three layers of benchmarks: holistic system-level (quantum volume), throughput (CLOPS), and application-specific (chemistry, optimisation, climate-relevant ML). All three matter, and all three will be published with raw shot data, calibration snapshots, and circuit transpilation traces, so external researchers can reproduce or contest the result. That is the standard we hold ourselves to.

Quantum Volume: what we will report and how

Quantum volume (QV), as defined by Cross et al., is a single-number figure of merit that captures the largest square random circuit (depth = width) a system can run with at least 2/3 heavy-output probability. It folds together qubit count, connectivity, gate fidelity, measurement error, crosstalk, and compiler quality. It is not perfect — it saturates fast on hardware where compiler optimisation can't compensate for sparse connectivity — but it is honest, because every term that hurts your hardware shows up in the score.

For a heavy-hex transmon lattice at 100 physical qubits, the realistic QV target during the Q2 2027 multi-qubit access window is in the 2^6 to 2^8 range, depending on how two-qubit gate error settles after the calibration campaign. We will not publish a QV number until we can do so with:

  • At least 100 randomly generated QV circuits per width
  • Bootstrap confidence intervals on heavy-output probability
  • A full calibration report — T1, T2 echo, single-qubit RB error, two-qubit RB error per coupler, readout assignment matrix — taken within 24 hours of the QV run
  • The exact transpiler version, basis-gate set, and routing heuristic used

If the number is disappointing, we publish anyway. A QV of 2^5 with honest provenance is more useful to the field than 2^9 with hidden post-selection.

CLOPS: throughput is the metric procurement actually cares about

Circuit Layer Operations Per Second (CLOPS) measures how many parameter-updated layers of a QV-style circuit a system can execute per second, end-to-end. It captures the things that quietly destroy real workloads: classical-quantum round-trip latency, parameter-binding overhead, control-electronics bandwidth, and the time the dilution refrigerator spends doing things other than running your circuit.

For variational workloads — VQE for ground-state chemistry, QAOA for grid optimisation, parameterised circuits for climate-finance Monte Carlo — CLOPS dominates wall-clock time. A machine with high QV and low CLOPS is useless for VQE because each optimiser iteration needs thousands of parameter updates, and if every update costs 200 ms of classical overhead you will never converge before the next calibration cycle.

Our CLOPS reporting commits to the original IBM-defined methodology — 100 templates, 10 parameter updates per template, 100 shots per binding — but we will additionally publish a cold CLOPS figure that includes job-submission latency from a Dublin-region endpoint. Sovereign hosting in Tipperary means the round-trip from Irish and EU users is single-digit milliseconds, and that should show up in the number.

Application-specific benchmarks: where we go beyond synthetic scores

Synthetic benchmarks tell you what a machine can do; application benchmarks tell you what it will do for you. Ireland Quantum performance will be reported on a fixed suite of climate-relevant workloads, and that suite is locked in before the cryostat is energised so we can't optimise the test post-hoc.

Chemistry: VQE on small but real systems

Initial chemistry benchmarks will run VQE with hardware-efficient and UCCSD ansätze on:

  • H2O in STO-3G and 6-31G bases — a calibration molecule with known FCI energies
  • LiH bond-dissociation curve — tests sensitivity to gate error along a parameter sweep
  • N2 active-space treatment — relevant to ammonia chemistry and fertiliser decarbonisation
  • CO2 + amine adduct active-space fragments — directly relevant to carbon-capture sorbent screening

For each, we publish energy error versus FCI (where tractable), shot count required to reach chemical accuracy (1.6 mHa), and total wall-clock time including classical optimiser overhead.

Optimisation: QAOA on grid and logistics graphs

QAOA at p=1 through p=4 on Max-Cut and weighted graph instances drawn from real Irish transmission-network topology (publicly available from EirGrid). Reported metrics: approximation ratio versus classical Goemans-Williamson, time-to-solution, and degradation curve as graph size approaches qubit count.

Quantum machine learning: kernel and variational classifiers

Quantum kernel methods on small climate datasets — atmospheric CO2 flux classification, satellite-derived land-use features. Honest comparison against classical RBF-kernel SVM baselines on the same data. If the classical baseline wins, we say so.

Calibration drift, error-correction roadmap, and what we will not benchmark yet

Transmon systems drift. T1 and T2 vary with thermal cycling, two-level-system defects come and go, and gate calibrations stale within hours. Any benchmark page that quotes a single fidelity number without a timestamp is misleading. We will publish a rolling calibration dashboard with per-qubit T1, T2 echo, and per-coupler two-qubit gate error, refreshed at every recalibration cycle. Benchmark runs are tagged with the exact calibration snapshot they used.

On error correction: we will not claim logical-qubit benchmarks until we have

Research collaboration or early access

Direct with Michael. No charge for the call.

Book a research call →