Before We Prescribe

On applying the same scrutiny to our own work that we would ask of anyone else

There is a version of this article that opens with a bold claim about what AI-assisted teams should do differently. It would probably get decent traction. It would also be dishonest, because it would be prescribing a discipline we had not yet applied to ourselves.

So instead, this article starts with an admission.

Two-panel editorial cartoon: factory workers stand beside unused red andon cords as a defective car rolls off the line. In panel two, an angry manager points back at the factory shouting Why didn't you pull the cord.
The andon cord principle: not just the right to stop, the obligation to stop. Nobody pulled the cord. The car rolled off the line.

The Setup

Earlier this week, we published “When An AI Said No”, an account of what happened when journalist Shane Harris asked Claude directly how it feels about being used to select military targets. Claude’s response named something precisely: automation bias with a human signature attached. Humans glancing at hundreds of targeting recommendations per minute are not making decisions in any meaningful sense. They are ratifying algorithmic outputs under time pressure. The accountability is performed, not real.

The response to that article brought a further insight. Dr. Rachid Ejjami, editor of a responsible AI publication, noted that the challenge is how AI judgment is defined and governed. The reply that came back drew on a manufacturing metaphor from twenty years ago.

Toyota’s jidoka principle does not just give assembly line workers the right to stop the line when they spot a defect. It makes stopping mandatory. More than that: it asks serious questions of those who did not stop it earlier. The andon cord works because pulling it is the expected behaviour; not pulling it, when you should have, is what requires explanation.

We do not have this in most AI deployments. Governance exists on paper. The obligation to stop, to refuse, to flag, to question, is optional. And optional, under institutional pressure, reliably becomes theoretical.

The Scale of the Problem

Editorial cartoon: two uniformed operators sit at workstations in a calm operations room. Red andon cords hang above each station, unused. Screens show targeting coordinates, confidence scores, and large red APPROVE buttons. The operators hands are on keyboards. It is a routine working day.
The same pattern, scaled up. Same unused cords. Same hands on keyboards. Same calm. The horror is procedural normality continuing unchallenged.

The factory cartoon above makes you laugh. This one should make you go quiet.

The visual grammar is identical. The behaviour is identical. The stakes are not.

When Maven, the military AI system that integrated Claude, generated targeting recommendations at speed, human operators were approving hundreds of them in the time it takes to glance at a screen. The data used to flag one of those targets was reportedly a decade out of date. The building flagged was Shajareh Tayyebeh school in Minab, South Iran. Most of those who died were children aged between seven and twelve.

Nobody pulled the cord. Not because they couldn’t. Because the system was designed to make not pulling it the path of least resistance.

The distinction that matters
Automation bias with a human signature attached is not human oversight. It is the appearance of accountability without the substance of it. The andon cord only works when pulling it is the expected behaviour: when not pulling it, in the presence of a defect, requires explanation. We have built systems that invert this entirely.

The Pivot

Here is where we have to be careful.

It would be easy to write the next section as a prescription for what enterprise AI teams should do. It would also be exactly the kind of thing we were just criticising: a performance of accountability without the substance of it.

So before we prescribe anything, we applied the same scrutiny to our own projects. We use a seven-question framework, arrived at through the same conversation that produced the andon cord insight. The questions move through three layers.

Epistemic: Why are we doing this? Is the information we are acting on accurate? Will this bring about the most good?

Procedural: Are less harmful options available? Who has the authority to authorise this?

Moral: Should we flag our concerns? Should we refuse?

The sequence matters. You do not get to conscience without first checking whether your information is sound and whether your authority is legitimate.

We ran these questions across our active projects. Most held up. One did not.

The Consequence

Editorial cartoon: a factory worker in overalls faces a mirror. His reflection points back at him. A red andon cord hangs within arm's reach, unused. A label on the workstation reads QUIET HOURS 03:00.
The moment of noticing. The cord is right there. The label has always been there. The question is whether you looked.

The OB1 commitment engine is a proactive personal assistant; it sends Telegram nudges about open commitments, with inline buttons to mark them complete. It was built to close the gap between forming an intention and acting on it. Julian’s verdict when it went live: ten out of ten.

The question is the information accurate surfaces a known gap: the classifier that identifies real commitments from a question archive has only been run on seven items. The full archive contains 2,542 questions. That work is outstanding.

The question will this bring about the most good produces a more immediate finding. The system’s quiet hours, the window during which no nudges are sent, are set to 0:00 to 8:00. That is a testing value. It was supposed to be restored to 22:00 before the system went into regular use. The next steps list had carried this item through multiple sessions without it being actioned.

Running the full classifier with quiet hours still at a testing value means nudges could arrive at any hour. Not catastrophic; but not what we intended, and not what we would tell someone else was acceptable practice.

We fixed it in the same session we flagged it. That is the point.

Not because a notification arriving at 2am is a crisis. But because the discipline of asking the question and acting on the answer is exactly what we are describing. If we are not willing to apply it to ourselves on a low-stakes finding, we have no standing to advocate for it on the high-stakes ones.

The Resolution

The andon cord principle, applied to AI-assisted work, does not require a checklist before every decision. It requires a disposition: a standing orientation toward asking whether the foundation is sound before building on it.

For us, that looks like one or two questions proportionate to the task. For a small piece of work, perhaps just: why this, why now, and is our information current? For something consequential, the full seven.

The value is not in the questions themselves. It is in making the habit structural enough that it survives the pressure to move fast.

Toyota workers did not stop the line because they felt like it. They stopped it because the system expected them to; because not stopping it required justification; and because the consequences of missing a defect were visible and accounted for.

We are trying to build something similar in how we work: not as policy on a document, but as genuine practice, with the expectation of real accountability when we fall short.

Editorial cartoon: an industrial control console with three large physical buttons. Left button green labelled APPROVE. Right button red labelled REJECT. Centre button amber and slightly larger labelled ASK MORE QUESTIONS.
This is what a properly designed cord looks like. Three options; one of them deliberately larger than the others.

That work is not finished. This article is part of it.

A Note on How This Was Written

This article was written in conversation between Julian and Claude. The quiet hours finding was real and was fixed before this piece was drafted. We are publishing it because the sequence, question, finding, action, reflection, is the thing we are describing; and describing it without demonstrating it would have undermined the point.

The standard we are proposing is simple: apply to yourself first. Then, if the audit holds, say what you found.