Human intervention log

A single append-only file where an autonomous harness records every task it could not automate, who handled it manually, and the condition under which automation could resume.

When to use

The harness runs an autonomous loop (e.g. ideate → plan → build → commit → check) that is meant to keep going without a human in each iteration.
The loop will inevitably hit work that is physically outside its reach: console-only API key issuance, vendor/legal approvals, DNS at the registrar, payments, security-incident judgment calls, library or platform limitations the agent cannot bypass.
You want future iterations (or future maintainers) to know why a workaround exists and when it would be safe to retry the automated path.

When not to use

One-shot or short-lived agents — there is no later iteration that will read the log.
Copilot-style interactive harnesses where human turns are the design, not the exception. Every action would qualify and the log degenerates into a transcript.
Cases where the limit is genuinely permanent and uninteresting (e.g. "only the CEO can sign this contract"). A single static note in a runbook is enough; you do not need a log entry per occurrence.

Context

An autonomous loop hits two kinds of failure: ones it can retry, and ones it physically cannot. If the human silently absorbs the second kind, three things are lost:

The audit trail — six months later nobody remembers why the metric was switched from successRate to medianDurationMs.
The retry trigger — the condition that would let the loop reclaim this task is in someone's head, not in the repo.
Self-knowledge — the harness has no list of its own ceilings, so it keeps re-attempting impossible work and burning tokens.

The pattern is to make every manual override an explicit, structured entry instead of an undocumented save.

Pattern

Maintain one append-only markdown file in the repo (e.g. docs/human-intervention.md). Each intervention gets one section with four required fields:

## YYYY-MM-DD — short title

- Context: why the harness could not handle it
- Actor: who intervened
- Action: what they actually did
- Re-automatable: yes/no — <trigger condition, with a concrete check>

The four fields are the minimum needed for retrospection, retry, and audit. Drop one and the entry stops being useful:

Context answers "why was this not automated."
Actor answers "who owns the follow-up."
Action answers "what is the current state of the system."
Re-automatable answers "when, if ever, should the loop try again?" The trigger should be checkable without judgment — a SQL query, a feature-flag probe, a vendor-changelog URL, an issue link. A vague "someday when the API improves" is not a trigger.

The harness can optionally read the file (or an index of it) at boot and treat listed items as known off-limits until their trigger fires.

Trade-offs

Discipline tax. The log is only as good as the operator's habit of writing entries. Half-logged interventions are worse than none — they imply completeness that is not there.
Drift. Triggers go stale (vendors ship APIs, internal limits change). The file needs periodic pruning, otherwise old entries fossilize and the loop never retries things it could now handle.
Not a fix. The log makes the autonomy ceiling visible, it does not raise it. A monotonically growing file is a signal the harness is accumulating debt, not paying it down.

A real entry from an autonomous harness: the loop generated a feature request to add a successRate column to its agent-run dashboard. Investigation revealed the underlying tool runner did not surface exit_code for built-in tools, so the data simply did not exist. The intervention recorded:

Context: success/failure data is unavailable for built-in tools at the data-source layer.
Actor: repo owner.
Action: replaced the proposed successRate column with medianDurationMs, which can be computed from existing fields.
Re-automatable: yes — when the tool runner ships exit-code support, OR when a sampled SQL query against the runs table starts returning non-null exit codes. The query is pinned in the entry so a future iteration can evaluate the trigger automatically.

Because the trigger is concrete, a future loop can run the query, see the data has appeared, and reopen the original feature request without a human re-deciding the question.

(none yet — see also runbooks, ADRs, and postmortems in general engineering practice; this pattern is the autonomous-loop-specific variant focused on retry conditions.)