Human intervention log
A single append-only file where an autonomous harness records every task it could not automate, who handled it manually, and the condition under which automation could resume.
When to use
- The harness runs an autonomous loop (e.g. ideate → plan → build → commit → check) that is meant to keep going without a human in each iteration.
- The loop will inevitably hit work that is physically outside its reach: console-only API key issuance, vendor/legal approvals, DNS at the registrar, payments, security-incident judgment calls, library or platform limitations the agent cannot bypass.
- You want future iterations (or future maintainers) to know why a workaround exists and when it would be safe to retry the automated path.
When not to use
- One-shot or short-lived agents — there is no later iteration that will read the log.
- Copilot-style interactive harnesses where human turns are the design, not the exception. Every action would qualify and the log degenerates into a transcript.
- Cases where the limit is genuinely permanent and uninteresting (e.g. "only the CEO can sign this contract"). A single static note in a runbook is enough; you do not need a log entry per occurrence.
Context
An autonomous loop hits two kinds of failure: ones it can retry, and ones it physically cannot. If the human silently absorbs the second kind, three things are lost:
- The audit trail — six months later nobody remembers why the metric was switched from
successRatetomedianDurationMs. - The retry trigger — the condition that would let the loop reclaim this task is in someone's head, not in the repo.
- Self-knowledge — the harness has no list of its own ceilings, so it keeps re-attempting impossible work and burning tokens.
The pattern is to make every manual override an explicit, structured entry instead of an undocumented save.
Pattern
Maintain one append-only markdown file in the repo (e.g. docs/human-intervention.md). Each intervention gets one section with four required fields:
## YYYY-MM-DD — short title
- Context: why the harness could not handle it
- Actor: who intervened
- Action: what they actually did
- Re-automatable: yes/no — <trigger condition, with a concrete check>
The four fields are the minimum needed for retrospection, retry, and audit. Drop one and the entry stops being useful:
- Context answers "why was this not automated."
- Actor answers "who owns the follow-up."
- Action answers "what is the current state of the system."
- Re-automatable answers "when, if ever, should the loop try again?" The trigger should be checkable without judgment — a SQL query, a feature-flag probe, a vendor-changelog URL, an issue link. A vague "someday when the API improves" is not a trigger.
The harness can optionally read the file (or an index of it) at boot and treat listed items as known off-limits until their trigger fires.
Trade-offs
- Discipline tax. The log is only as good as the operator's habit of writing entries. Half-logged interventions are worse than none — they imply completeness that is not there.
- Drift. Triggers go stale (vendors ship APIs, internal limits change). The file needs periodic pruning, otherwise old entries fossilize and the loop never retries things it could now handle.
- Not a fix. The log makes the autonomy ceiling visible, it does not raise it. A monotonically growing file is a signal the harness is accumulating debt, not paying it down.
Example
A real entry from an autonomous harness: the loop generated a feature request to add a successRate column to its agent-run dashboard. Investigation revealed the underlying tool runner did not surface exit_code for built-in tools, so the data simply did not exist. The intervention recorded:
- Context: success/failure data is unavailable for built-in tools at the data-source layer.
- Actor: repo owner.
- Action: replaced the proposed
successRatecolumn withmedianDurationMs, which can be computed from existing fields. - Re-automatable: yes — when the tool runner ships exit-code support, OR when a sampled SQL query against the runs table starts returning non-null exit codes. The query is pinned in the entry so a future iteration can evaluate the trigger automatically.
Because the trigger is concrete, a future loop can run the query, see the data has appeared, and reopen the original feature request without a human re-deciding the question.
Related patterns
(none yet — see also runbooks, ADRs, and postmortems in general engineering practice; this pattern is the autonomous-loop-specific variant focused on retry conditions.)