Juho Choi

Human intervention log

Engineering · May 3, 2026 · 4 min read
#ai-ml #agent-harness #patterns

Human intervention log

A single append-only file where an autonomous harness records every task it could not automate, who handled it manually, and the condition under which automation could resume.

When to use

  • The harness runs an autonomous loop (e.g. ideate → plan → build → commit → check) that is meant to keep going without a human in each iteration.
  • The loop will inevitably hit work that is physically outside its reach: console-only API key issuance, vendor/legal approvals, DNS at the registrar, payments, security-incident judgment calls, library or platform limitations the agent cannot bypass.
  • You want future iterations (or future maintainers) to know why a workaround exists and when it would be safe to retry the automated path.

When not to use

  • One-shot or short-lived agents — there is no later iteration that will read the log.
  • Copilot-style interactive harnesses where human turns are the design, not the exception. Every action would qualify and the log degenerates into a transcript.
  • Cases where the limit is genuinely permanent and uninteresting (e.g. "only the CEO can sign this contract"). A single static note in a runbook is enough; you do not need a log entry per occurrence.

Context

An autonomous loop hits two kinds of failure: ones it can retry, and ones it physically cannot. If the human silently absorbs the second kind, three things are lost:

  • The audit trail — six months later nobody remembers why the metric was switched from successRate to medianDurationMs.
  • The retry trigger — the condition that would let the loop reclaim this task is in someone's head, not in the repo.
  • Self-knowledge — the harness has no list of its own ceilings, so it keeps re-attempting impossible work and burning tokens.

The pattern is to make every manual override an explicit, structured entry instead of an undocumented save.

Pattern

Maintain one append-only markdown file in the repo (e.g. docs/human-intervention.md). Each intervention gets one section with four required fields:

## YYYY-MM-DD — short title

- Context: why the harness could not handle it
- Actor: who intervened
- Action: what they actually did
- Re-automatable: yes/no — <trigger condition, with a concrete check>

The four fields are the minimum needed for retrospection, retry, and audit. Drop one and the entry stops being useful:

  • Context answers "why was this not automated."
  • Actor answers "who owns the follow-up."
  • Action answers "what is the current state of the system."
  • Re-automatable answers "when, if ever, should the loop try again?" The trigger should be checkable without judgment — a SQL query, a feature-flag probe, a vendor-changelog URL, an issue link. A vague "someday when the API improves" is not a trigger.

The harness can optionally read the file (or an index of it) at boot and treat listed items as known off-limits until their trigger fires.

Trade-offs

  • Discipline tax. The log is only as good as the operator's habit of writing entries. Half-logged interventions are worse than none — they imply completeness that is not there.
  • Drift. Triggers go stale (vendors ship APIs, internal limits change). The file needs periodic pruning, otherwise old entries fossilize and the loop never retries things it could now handle.
  • Not a fix. The log makes the autonomy ceiling visible, it does not raise it. A monotonically growing file is a signal the harness is accumulating debt, not paying it down.

Example

A real entry from an autonomous harness: the loop generated a feature request to add a successRate column to its agent-run dashboard. Investigation revealed the underlying tool runner did not surface exit_code for built-in tools, so the data simply did not exist. The intervention recorded:

  • Context: success/failure data is unavailable for built-in tools at the data-source layer.
  • Actor: repo owner.
  • Action: replaced the proposed successRate column with medianDurationMs, which can be computed from existing fields.
  • Re-automatable: yes — when the tool runner ships exit-code support, OR when a sampled SQL query against the runs table starts returning non-null exit codes. The query is pinned in the entry so a future iteration can evaluate the trigger automatically.

Because the trigger is concrete, a future loop can run the query, see the data has appeared, and reopen the original feature request without a human re-deciding the question.

(none yet — see also runbooks, ADRs, and postmortems in general engineering practice; this pattern is the autonomous-loop-specific variant focused on retry conditions.)