We didn't buy AI; we built one, then taught it to admit defeat

Every agency in Britain bolted ‘AI-powered’ onto its homepage this year; we spent three months teaching ours to tell a client when it can’t do the job.

I want to be precise about what that sentence means, because “AI-powered” has been worn so smooth by overuse that it now means almost nothing. For most agencies it means they have a ChatGPT subscription and someone who pastes briefs into it. There’s no shame in using the tools — I use them every day — but it isn’t a capability. It’s a habit you share with several hundred million other people.

What I’m describing is different, and the difference is the whole point of this post. We built an AI layer into our research and survey work that does specific, bounded jobs to a measurable standard. And the most useful thing it does is know where its competence stops.

What it actually does

Here’s a real example, stripped of client names because the work isn’t mine to put a logo on.

We built a system that turns a plain-English research brief into a fully structured survey screener — the set of qualifying questions that decides who’s eligible to take a study. Writing those by hand is slow, fiddly, and exactly the kind of work where a tired human introduces small errors at eleven at night. So we handed the mechanical part to a machine.

But not the way you’d guess. The trick — and it’s the line I’m proudest of — is this: the AI isn’t asked to write prose. It’s constrained to call a specific tool with a defined schema. It doesn’t return a paragraph that a human then has to interpret and retype. It returns structured data shaped to fit the survey system directly, and every field is validated against that schema before anything is inserted. If what it produces doesn’t match — wrong type, missing field, a value out of range — the system rejects it and retries, with feedback on exactly what went wrong, until it conforms.

That constraint is the difference between a party trick and a tool. A model asked to “write me a screener” will cheerfully invent something plausible-looking that’s subtly broken. A model constrained to call a defined tool either produces something that fits the rules or doesn’t get to produce anything at all. We pay roughly fifteen pence per parse for that. It is the cheapest reliable colleague I’ve ever hired.

There’s a second operation I like even more. It takes a telephone-interview screener and converts it into a self-completion web survey in a single click — rewriting the register from “interviewer reads this aloud” to “respondent reads this on a phone,” broadening answer options, suppressing questions we already know the answer to, and then writing a changelog of precisely what it changed and why. The human doesn’t rewrite from a blank page. They read a summary of the changes and approve or correct. The judgement stays human; the typing doesn’t.

And running quietly underneath it all is a quality-scoring system that checks every respondent — duplicate detection across two separate databases, a decay curve for people who take surveys too often, VPN and proxy checks — and resolves all of it into a single score from nought to a hundred. On a panel of around 290,000 people, that is not work you want a human doing by hand, and it’s not work you want done badly.

The bit we can’t do — and why I’m telling you

Now the part that most agencies would quietly leave off the slide.

The system produces flat quotas reliably. Tell it you need 200 men and 300 women, or a regional spread across the UK, and it’ll structure that correctly every time. What it does not yet do is build interlocking quota matrices — where one group’s breakdown is itself a structured table inside another. Where you need 200 men and of those a specific age spread, and that age spread differing by region. The nested, conditional version of the problem. We’re working on it. It isn’t solved.

I could have left that paragraph out. You’d never have known. And that’s exactly why it’s in.

We say so out loud because pretending otherwise would catch us out the first time a client asked. Imagine the alternative: we let the impression stand that the machine handles everything, a client hands us a genuinely interlocking quota design, and we discover the limitation live, in front of them, with a deadline attached. That’s not a marketing problem at that point. That’s a trust problem, and it’s terminal.

An AI that knows its own edges is worth more than one that bluffs, for the same reason a tradesperson who tells you a job is outside their scope is worth more than one who has a go and leaves you with a leak. The honesty isn’t a softer, nicer alternative to capability. It is the capability. Knowing exactly where the reliable part ends is what makes the reliable part trustworthy.

Heavy in the plumbing, human at the surface

There’s a debate running through marketing right now that treats AI as a binary: either you’re all-in and your output is machine-slop, or you’re a purist and you refuse to touch it. I think that’s a false choice, and we’re living proof you can sit in the middle on purpose.

We use AI heavily — but in the plumbing. The parsing, the validation, the conversion, the scoring: the work people find least interesting and are most likely to get wrong at the end of a long day. What we don’t do is let it write the things that are supposed to sound like a person. The strategy, the judgement about who a study is really for, the voice — that stays with the humans, because that’s the part where being human is the whole value.

If you’ve read this far you can probably tell I’d rather show you the workings than the badge. That’s deliberate. The people you’d work with here are the ones who built this — operators, not account executives who’ll relay your questions to a developer you never meet. You can read a bit about how we’re set up if that matters to you, and it should.

So if you’re evaluating an agency on its AI claims, here’s the test I’d apply to us and everyone else: ask what it can’t do. If the answer is a confident silence, be careful. If the answer is a specific, documented limitation someone can explain to you — like the one I’ve just handed you for free — you’re talking to people who actually built the thing. Start a conversation and put us to that test.

What it actually does

The bit we can’t do — and why I’m telling you

Heavy in the plumbing, human at the surface

Keep pulling the thread

You parked your marketing — that was the right call until now

Marketing isn't a campaign; it's a thread you don't cut

Start with a conversation.