<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Juho Choi</title><description>I build the systems that make AI agents reliable in production.</description><link>https://juhochoi.me/</link><item><title>Thin hook manifest</title><link>https://juhochoi.me/writing/engineering/thin-hook-manifest/</link><guid isPermaLink="true">https://juhochoi.me/writing/engineering/thin-hook-manifest/</guid><description>Have the hook config file declare only *what events to subscribe to* and route every one of them to a single handler binary; keep all branching, transformation, and policy in that handler.</description><pubDate>Mon, 04 May 2026 18:29:44 GMT</pubDate><content:encoded>&lt;h1 id=&quot;thin-hook-manifest&quot;&gt;&lt;a href=&quot;#thin-hook-manifest&quot;&gt;Thin hook manifest&lt;/a&gt;&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;Have the hook config file declare only &lt;em&gt;what events to subscribe to&lt;/em&gt; and route every one of them to a single handler binary; keep all branching, transformation, and policy in that handler.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&quot;when-to-use&quot;&gt;&lt;a href=&quot;#when-to-use&quot;&gt;When to use&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Building external tooling (telemetry, audit, policy enforcement) on top of a harness that exposes events through a user-edited config file (e.g. Claude Code&apos;s &lt;code&gt;.claude/settings.json&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;You expect the integration&apos;s logic to evolve faster than the user&apos;s machine can be re-configured.&lt;/li&gt;
&lt;li&gt;You need typed payload handling, unit tests, or per-event branching that the manifest schema (&lt;code&gt;{matcher, command}&lt;/code&gt;) can&apos;t express.&lt;/li&gt;
&lt;li&gt;Many event kinds want the same treatment (&quot;log everything&quot;), so per-event configuration would just be repetition.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;when-not-to-use&quot;&gt;&lt;a href=&quot;#when-not-to-use&quot;&gt;When not to use&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;A one-off shell hook that does one thing and never changes — &lt;code&gt;&quot;command&quot;: &quot;echo done &gt;&gt; ~/log&quot;&lt;/code&gt; directly in the manifest is fine; wrapping it in a CLI is overkill.&lt;/li&gt;
&lt;li&gt;The manifest schema is already expressive enough for the full policy.&lt;/li&gt;
&lt;li&gt;You don&apos;t control a fast distribution channel for the handler. If updating the handler is harder than updating the manifest, the leverage flips and the manifest is the right place for logic.&lt;/li&gt;
&lt;li&gt;Hot paths where per-event handler-process spawn cost matters more than the cost of fanning out matchers in the manifest — then push filtering into the manifest.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;context&quot;&gt;&lt;a href=&quot;#context&quot;&gt;Context&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The pattern is illustrative, not a law. It earns its keep when &lt;em&gt;both&lt;/em&gt; of these are true: the manifest schema is too narrow for the real policy, AND the handler ships through a faster channel than the manifest does.&lt;/p&gt;
&lt;p&gt;Anything baked into the manifest becomes a deployment artifact — changing it forces every user to re-install or hand-edit. Anything in the handler propagates the next time the user updates the package. Hook config schemas are also typically narrow on purpose: matcher + command. Real policy (&quot;truncate this field, parse the transcript file, drop sub-agent events, count tokens&quot;) has nowhere to live in JSON.&lt;/p&gt;
&lt;h2 id=&quot;pattern&quot;&gt;&lt;a href=&quot;#pattern&quot;&gt;Pattern&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Two parts, with the asymmetry on purpose.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Manifest (thin).&lt;/strong&gt; Every event of interest points at the &lt;em&gt;same&lt;/em&gt; command. No per-event command, no matcher cleverness. Adding a new event type is one line.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-json&quot;&gt;{
  &quot;hooks&quot;: {
    &quot;PreToolUse&quot;:   [{ &quot;matcher&quot;: &quot;&quot;, &quot;hooks&quot;: [{ &quot;type&quot;: &quot;command&quot;, &quot;command&quot;: &quot;myagent hook&quot; }] }],
    &quot;PostToolUse&quot;:  [{ &quot;matcher&quot;: &quot;&quot;, &quot;hooks&quot;: [{ &quot;type&quot;: &quot;command&quot;, &quot;command&quot;: &quot;myagent hook&quot; }] }],
    &quot;SessionStart&quot;: [{ &quot;matcher&quot;: &quot;&quot;, &quot;hooks&quot;: [{ &quot;type&quot;: &quot;command&quot;, &quot;command&quot;: &quot;myagent hook&quot; }] }],
    &quot;Stop&quot;:         [{ &quot;matcher&quot;: &quot;&quot;, &quot;hooks&quot;: [{ &quot;type&quot;: &quot;command&quot;, &quot;command&quot;: &quot;myagent hook&quot; }] }]
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Handler (thick).&lt;/strong&gt; One entry point. Every event lands here and branches on the event name in the payload. This is where typed parsing, transcript reads, network sends, truncation, and the never-block safeties live.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-text&quot;&gt;myagent hook  (single binary)
  ├─ read JSON from stdin (with short timeout)
  ├─ branch on payload.hook_event_name
  │     ├─ PreToolUse → record tool args
  │     ├─ Stop       → parse transcript, summarize
  │     └─ ...
  ├─ ship to backend (detached, fire-and-forget)
  └─ exit 0 always (try/finally)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Corollaries:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Policy changes ship through the handler&apos;s package manager (&lt;code&gt;npm publish&lt;/code&gt;, &lt;code&gt;pip upload&lt;/code&gt;), not user re-configuration.&lt;/li&gt;
&lt;li&gt;Branching is a single-source-of-truth file; reading it tells the whole story of what the integration does.&lt;/li&gt;
&lt;li&gt;&quot;Never block the harness&quot; safeties — stdin timeout, detached I/O, catch-all exit 0 — are implemented once at the single sink, not duplicated per event.&lt;/li&gt;
&lt;li&gt;The handler&apos;s core functions (e.g. &lt;code&gt;buildPayload&lt;/code&gt;, &lt;code&gt;convertEventType&lt;/code&gt;) are ordinary pure functions, easy to unit-test.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;trade-offs&quot;&gt;&lt;a href=&quot;#trade-offs&quot;&gt;Trade-offs&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Process spawn per event.&lt;/strong&gt; Every tool call spawns the handler. The cost is real (tens of ms each) and adds up on busy sessions. If measurement shows it matters, push a coarse matcher into the manifest to skip uninteresting events at the source.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Opaque manifest.&lt;/strong&gt; A reader looking at the user&apos;s &lt;code&gt;settings.json&lt;/code&gt; sees only &lt;code&gt;myagent hook&lt;/code&gt; and learns nothing about what runs. You&apos;re trading manifest legibility for handler legibility — document the payload contract somewhere reachable.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Single point of failure.&lt;/strong&gt; If the handler binary isn&apos;t on PATH, every event fails. A common mitigation is to make the &lt;code&gt;SessionStart&lt;/code&gt; entry self-bootstrapping (&lt;code&gt;command -v myagent || install ...; myagent hook&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Coarse user controls.&lt;/strong&gt; Users can&apos;t easily say &quot;track Bash but not Read&quot; — the manifest doesn&apos;t know about the distinction. If user-tunable filtering matters, expose it via the handler&apos;s own config or env vars, not the manifest.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;example&quot;&gt;&lt;a href=&quot;#example&quot;&gt;Example&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The same shape recurs across infrastructure: webhook receivers (URL registration vs. receiver code), Kubernetes (&lt;code&gt;Service&lt;/code&gt;/&lt;code&gt;Ingress&lt;/code&gt; YAML vs. controller), AWS Lambda + API Gateway (route vs. function), GitHub Actions (&lt;code&gt;on:&lt;/code&gt; vs. step script), systemd (&lt;code&gt;ExecStart=&lt;/code&gt; vs. binary). The manifest answers &lt;em&gt;what/when&lt;/em&gt;, the handler answers &lt;em&gt;how&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;A textbook instance for Claude Code is the Argos telemetry CLI: its &lt;code&gt;.claude/settings.json&lt;/code&gt; injects identical &lt;code&gt;argos hook&lt;/code&gt; entries for every event type, and a single &lt;code&gt;hook&lt;/code&gt; command in the CLI does all routing internally.&lt;/p&gt;
&lt;h2 id=&quot;related-patterns&quot;&gt;&lt;a href=&quot;#related-patterns&quot;&gt;Related patterns&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;(none yet)&lt;/em&gt;&lt;/p&gt;</content:encoded><category>ai-ml</category><category>agent-harness</category><category>patterns</category></item><item><title>Human intervention log</title><link>https://juhochoi.me/writing/engineering/human-intervention-log/</link><guid isPermaLink="true">https://juhochoi.me/writing/engineering/human-intervention-log/</guid><description>A single append-only file where an autonomous harness records every task it could not automate, who handled it manually, and the condition under which automation could resume.</description><pubDate>Sun, 03 May 2026 01:10:12 GMT</pubDate><content:encoded>&lt;h1 id=&quot;human-intervention-log&quot;&gt;&lt;a href=&quot;#human-intervention-log&quot;&gt;Human intervention log&lt;/a&gt;&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;A single append-only file where an autonomous harness records every task it could not automate, who handled it manually, and the condition under which automation could resume.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&quot;when-to-use&quot;&gt;&lt;a href=&quot;#when-to-use&quot;&gt;When to use&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The harness runs an autonomous loop (e.g. ideate → plan → build → commit → check) that is meant to keep going without a human in each iteration.&lt;/li&gt;
&lt;li&gt;The loop will inevitably hit work that is physically outside its reach: console-only API key issuance, vendor/legal approvals, DNS at the registrar, payments, security-incident judgment calls, library or platform limitations the agent cannot bypass.&lt;/li&gt;
&lt;li&gt;You want future iterations (or future maintainers) to know &lt;em&gt;why&lt;/em&gt; a workaround exists and &lt;em&gt;when&lt;/em&gt; it would be safe to retry the automated path.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;when-not-to-use&quot;&gt;&lt;a href=&quot;#when-not-to-use&quot;&gt;When not to use&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;One-shot or short-lived agents — there is no later iteration that will read the log.&lt;/li&gt;
&lt;li&gt;Copilot-style interactive harnesses where human turns are the design, not the exception. Every action would qualify and the log degenerates into a transcript.&lt;/li&gt;
&lt;li&gt;Cases where the limit is genuinely permanent and uninteresting (e.g. &quot;only the CEO can sign this contract&quot;). A single static note in a runbook is enough; you do not need a log entry per occurrence.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;context&quot;&gt;&lt;a href=&quot;#context&quot;&gt;Context&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;An autonomous loop hits two kinds of failure: ones it can retry, and ones it physically cannot. If the human silently absorbs the second kind, three things are lost:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The audit trail — six months later nobody remembers why the metric was switched from &lt;code&gt;successRate&lt;/code&gt; to &lt;code&gt;medianDurationMs&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The retry trigger — the condition that would let the loop reclaim this task is in someone&apos;s head, not in the repo.&lt;/li&gt;
&lt;li&gt;Self-knowledge — the harness has no list of its own ceilings, so it keeps re-attempting impossible work and burning tokens.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The pattern is to make every manual override an explicit, structured entry instead of an undocumented save.&lt;/p&gt;
&lt;h2 id=&quot;pattern&quot;&gt;&lt;a href=&quot;#pattern&quot;&gt;Pattern&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Maintain one append-only markdown file in the repo (e.g. &lt;code&gt;docs/human-intervention.md&lt;/code&gt;). Each intervention gets one section with four required fields:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-text&quot;&gt;## YYYY-MM-DD — short title

- Context: why the harness could not handle it
- Actor: who intervened
- Action: what they actually did
- Re-automatable: yes/no — &amp;#x3C;trigger condition, with a concrete check&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The four fields are the minimum needed for retrospection, retry, and audit. Drop one and the entry stops being useful:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Context&lt;/strong&gt; answers &quot;why was this not automated.&quot;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Actor&lt;/strong&gt; answers &quot;who owns the follow-up.&quot;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Action&lt;/strong&gt; answers &quot;what is the current state of the system.&quot;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Re-automatable&lt;/strong&gt; answers &quot;when, if ever, should the loop try again?&quot; The trigger should be checkable without judgment — a SQL query, a feature-flag probe, a vendor-changelog URL, an issue link. A vague &quot;someday when the API improves&quot; is not a trigger.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The harness can optionally read the file (or an index of it) at boot and treat listed items as known off-limits until their trigger fires.&lt;/p&gt;
&lt;h2 id=&quot;trade-offs&quot;&gt;&lt;a href=&quot;#trade-offs&quot;&gt;Trade-offs&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Discipline tax.&lt;/strong&gt; The log is only as good as the operator&apos;s habit of writing entries. Half-logged interventions are worse than none — they imply completeness that is not there.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Drift.&lt;/strong&gt; Triggers go stale (vendors ship APIs, internal limits change). The file needs periodic pruning, otherwise old entries fossilize and the loop never retries things it could now handle.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Not a fix.&lt;/strong&gt; The log makes the autonomy ceiling &lt;em&gt;visible&lt;/em&gt;, it does not raise it. A monotonically growing file is a signal the harness is accumulating debt, not paying it down.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;example&quot;&gt;&lt;a href=&quot;#example&quot;&gt;Example&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;A real entry from an autonomous harness: the loop generated a feature request to add a &lt;code&gt;successRate&lt;/code&gt; column to its agent-run dashboard. Investigation revealed the underlying tool runner did not surface &lt;code&gt;exit_code&lt;/code&gt; for built-in tools, so the data simply did not exist. The intervention recorded:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Context: success/failure data is unavailable for built-in tools at the data-source layer.&lt;/li&gt;
&lt;li&gt;Actor: repo owner.&lt;/li&gt;
&lt;li&gt;Action: replaced the proposed &lt;code&gt;successRate&lt;/code&gt; column with &lt;code&gt;medianDurationMs&lt;/code&gt;, which can be computed from existing fields.&lt;/li&gt;
&lt;li&gt;Re-automatable: yes — when the tool runner ships exit-code support, OR when a sampled SQL query against the runs table starts returning non-null exit codes. The query is pinned in the entry so a future iteration can evaluate the trigger automatically.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Because the trigger is concrete, a future loop can run the query, see the data has appeared, and reopen the original feature request without a human re-deciding the question.&lt;/p&gt;
&lt;h2 id=&quot;related-patterns&quot;&gt;&lt;a href=&quot;#related-patterns&quot;&gt;Related patterns&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;(none yet — see also runbooks, ADRs, and postmortems in general engineering practice; this pattern is the autonomous-loop-specific variant focused on retry conditions.)&lt;/em&gt;&lt;/p&gt;</content:encoded><category>ai-ml</category><category>agent-harness</category><category>patterns</category></item></channel></rss>