Juho Choi

Thin hook manifest

Mon, 04 May 2026 18:29:44 GMT

Thin hook manifest

Have the hook config file declare only what events to subscribe to and route every one of them to a single handler binary; keep all branching, transformation, and policy in that handler.

When to use

Building external tooling (telemetry, audit, policy enforcement) on top of a harness that exposes events through a user-edited config file (e.g. Claude Code's .claude/settings.json).
You expect the integration's logic to evolve faster than the user's machine can be re-configured.
You need typed payload handling, unit tests, or per-event branching that the manifest schema ({matcher, command}) can't express.
Many event kinds want the same treatment ("log everything"), so per-event configuration would just be repetition.

When not to use

A one-off shell hook that does one thing and never changes — "command": "echo done >> ~/log" directly in the manifest is fine; wrapping it in a CLI is overkill.
The manifest schema is already expressive enough for the full policy.
You don't control a fast distribution channel for the handler. If updating the handler is harder than updating the manifest, the leverage flips and the manifest is the right place for logic.
Hot paths where per-event handler-process spawn cost matters more than the cost of fanning out matchers in the manifest — then push filtering into the manifest.

The pattern is illustrative, not a law. It earns its keep when both of these are true: the manifest schema is too narrow for the real policy, AND the handler ships through a faster channel than the manifest does.

Anything baked into the manifest becomes a deployment artifact — changing it forces every user to re-install or hand-edit. Anything in the handler propagates the next time the user updates the package. Hook config schemas are also typically narrow on purpose: matcher + command. Real policy ("truncate this field, parse the transcript file, drop sub-agent events, count tokens") has nowhere to live in JSON.

Pattern

Two parts, with the asymmetry on purpose.

Manifest (thin). Every event of interest points at the same command. No per-event command, no matcher cleverness. Adding a new event type is one line.

{
  "hooks": {
    "PreToolUse":   [{ "matcher": "", "hooks": [{ "type": "command", "command": "myagent hook" }] }],
    "PostToolUse":  [{ "matcher": "", "hooks": [{ "type": "command", "command": "myagent hook" }] }],
    "SessionStart": [{ "matcher": "", "hooks": [{ "type": "command", "command": "myagent hook" }] }],
    "Stop":         [{ "matcher": "", "hooks": [{ "type": "command", "command": "myagent hook" }] }]
  }
}

Handler (thick). One entry point. Every event lands here and branches on the event name in the payload. This is where typed parsing, transcript reads, network sends, truncation, and the never-block safeties live.

myagent hook  (single binary)
  ├─ read JSON from stdin (with short timeout)
  ├─ branch on payload.hook_event_name
  │     ├─ PreToolUse → record tool args
  │     ├─ Stop       → parse transcript, summarize
  │     └─ ...
  ├─ ship to backend (detached, fire-and-forget)
  └─ exit 0 always (try/finally)

Corollaries:

Policy changes ship through the handler's package manager (npm publish, pip upload), not user re-configuration.
Branching is a single-source-of-truth file; reading it tells the whole story of what the integration does.
"Never block the harness" safeties — stdin timeout, detached I/O, catch-all exit 0 — are implemented once at the single sink, not duplicated per event.
The handler's core functions (e.g. buildPayload, convertEventType) are ordinary pure functions, easy to unit-test.

Trade-offs

Process spawn per event. Every tool call spawns the handler. The cost is real (tens of ms each) and adds up on busy sessions. If measurement shows it matters, push a coarse matcher into the manifest to skip uninteresting events at the source.
Opaque manifest. A reader looking at the user's settings.json sees only myagent hook and learns nothing about what runs. You're trading manifest legibility for handler legibility — document the payload contract somewhere reachable.
Single point of failure. If the handler binary isn't on PATH, every event fails. A common mitigation is to make the SessionStart entry self-bootstrapping (command -v myagent || install ...; myagent hook).
Coarse user controls. Users can't easily say "track Bash but not Read" — the manifest doesn't know about the distinction. If user-tunable filtering matters, expose it via the handler's own config or env vars, not the manifest.

Example

The same shape recurs across infrastructure: webhook receivers (URL registration vs. receiver code), Kubernetes (Service/Ingress YAML vs. controller), AWS Lambda + API Gateway (route vs. function), GitHub Actions (on: vs. step script), systemd (ExecStart= vs. binary). The manifest answers what/when, the handler answers how.

A textbook instance for Claude Code is the Argos telemetry CLI: its .claude/settings.json injects identical argos hook entries for every event type, and a single hook command in the CLI does all routing internally.

(none yet)

Human intervention log

Sun, 03 May 2026 01:10:12 GMT

Human intervention log

A single append-only file where an autonomous harness records every task it could not automate, who handled it manually, and the condition under which automation could resume.

When to use

The harness runs an autonomous loop (e.g. ideate → plan → build → commit → check) that is meant to keep going without a human in each iteration.
The loop will inevitably hit work that is physically outside its reach: console-only API key issuance, vendor/legal approvals, DNS at the registrar, payments, security-incident judgment calls, library or platform limitations the agent cannot bypass.
You want future iterations (or future maintainers) to know why a workaround exists and when it would be safe to retry the automated path.

When not to use

One-shot or short-lived agents — there is no later iteration that will read the log.
Copilot-style interactive harnesses where human turns are the design, not the exception. Every action would qualify and the log degenerates into a transcript.
Cases where the limit is genuinely permanent and uninteresting (e.g. "only the CEO can sign this contract"). A single static note in a runbook is enough; you do not need a log entry per occurrence.

Context

An autonomous loop hits two kinds of failure: ones it can retry, and ones it physically cannot. If the human silently absorbs the second kind, three things are lost:

The audit trail — six months later nobody remembers why the metric was switched from successRate to medianDurationMs.
The retry trigger — the condition that would let the loop reclaim this task is in someone's head, not in the repo.
Self-knowledge — the harness has no list of its own ceilings, so it keeps re-attempting impossible work and burning tokens.

The pattern is to make every manual override an explicit, structured entry instead of an undocumented save.

Pattern

Maintain one append-only markdown file in the repo (e.g. docs/human-intervention.md). Each intervention gets one section with four required fields:

## YYYY-MM-DD — short title

- Context: why the harness could not handle it
- Actor: who intervened
- Action: what they actually did
- Re-automatable: yes/no — <trigger condition, with a concrete check>

The four fields are the minimum needed for retrospection, retry, and audit. Drop one and the entry stops being useful:

Context answers "why was this not automated."
Actor answers "who owns the follow-up."
Action answers "what is the current state of the system."
Re-automatable answers "when, if ever, should the loop try again?" The trigger should be checkable without judgment — a SQL query, a feature-flag probe, a vendor-changelog URL, an issue link. A vague "someday when the API improves" is not a trigger.

The harness can optionally read the file (or an index of it) at boot and treat listed items as known off-limits until their trigger fires.

Trade-offs

Discipline tax. The log is only as good as the operator's habit of writing entries. Half-logged interventions are worse than none — they imply completeness that is not there.
Drift. Triggers go stale (vendors ship APIs, internal limits change). The file needs periodic pruning, otherwise old entries fossilize and the loop never retries things it could now handle.
Not a fix. The log makes the autonomy ceiling visible, it does not raise it. A monotonically growing file is a signal the harness is accumulating debt, not paying it down.

Example

A real entry from an autonomous harness: the loop generated a feature request to add a successRate column to its agent-run dashboard. Investigation revealed the underlying tool runner did not surface exit_code for built-in tools, so the data simply did not exist. The intervention recorded:

Context: success/failure data is unavailable for built-in tools at the data-source layer.
Actor: repo owner.
Action: replaced the proposed successRate column with medianDurationMs, which can be computed from existing fields.
Re-automatable: yes — when the tool runner ships exit-code support, OR when a sampled SQL query against the runs table starts returning non-null exit codes. The query is pinned in the entry so a future iteration can evaluate the trigger automatically.

Because the trigger is concrete, a future loop can run the query, see the data has appeared, and reopen the original feature request without a human re-deciding the question.

(none yet — see also runbooks, ADRs, and postmortems in general engineering practice; this pattern is the autonomous-loop-specific variant focused on retry conditions.)