KeiSeiKit-1.0/tests/hook-outcome-backfill-test.sh
Parfii-bot be1a864629 feat(outcome-hook): PostToolUse:Agent backfills outcome + stubs in ledger
Phase 4 follow-up: the outcome-backfill hook the kei-model-router
needs to learn from. Without an outcome signal the Beta posterior
sees 205 NULL rows and can never converge → router falls back to
top-tier on every spawn. This hook closes that loop.

Spawned by orchestrator as a code-implementer agent (Sonnet 4.6 by
default after the manifest refactor in cb1fdde — first dogfood proof
that the tier system works end-to-end). Agent returned cleanly with
STATUS-TRUTH MARKER `shipped: functional, stubs: 0`.

Files (3):
- `~/.claude/hooks/agent-outcome-backfill.sh` (73 LOC, /bin/sh) —
  reads PostToolUse:Agent stdin JSON, parses STATUS-TRUTH MARKER from
  `tool_response`, runs `UPDATE agents SET outcome = ?, stubs_count = ?
  WHERE id = ?` via sqlite3 CLI. Defensive on every step (never blocks,
  exits 0 on missing jq / sqlite3 / DB / marker). Bypass:
  `OUTCOME_BACKFILL_BYPASS=1`. Lives outside the kit (system-level).
- `tests/hook-outcome-backfill-test.sh` (79 LOC, /bin/sh) — 8 assertions
  cover: 4 valid outcomes, idempotent re-run, missing marker, bypass
  env, missing sqlite3 (PATH stripped). Run via
  `sh tests/hook-outcome-backfill-test.sh` → "Passed: 8 Failed: 0".
- `_blocks/path-user-hooks.md` — third path-atom following
  user-memory / user-rules convention. Resolves to `~/.claude/hooks/`.
  Lets future manifests reference hook files via
  `path:user-hooks/<file>.sh` opaquely. Registered in registry as
  `atom::md::331b9a34::023e5a08`.

Wiring:
- `~/.claude/settings.json` PostToolUse:Agent matcher chain — appended
  the hook idempotently (jq update preserves existing
  `agent-stub-scan.sh`, `task-timer.sh`, `agent-fork-done.sh`).
- DNA-INDEX regenerated; new path-atom appears in `## Atom (120)`
  section.

Effect: every Agent tool call from now on writes outcome + stubs to
ledger. After ~10-20 invocations the Beta posterior has a usable
prior; after ~50 the router stops defaulting Sonnet to Opus on
unfamiliar tasks. The advisor hook (`model-router-advisor.sh`)
already prints stderr when current model > recommended — orchestrator
needs to actually pass `model:` parameter on next spawn (behavioural,
not a code change).

=== STATUS-TRUTH MARKER ===
shipped: functional
stubs: 0
cargo-check: NOT-RUN
behaviour-verified: yes
follow-up-required:
  - PR feat/substrate-path-atoms-2026-05-01 → main when ready

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 23:24:02 +08:00

79 lines
3 KiB
Bash
Executable file

#!/bin/sh
# hook-outcome-backfill-test.sh — exercises agent-outcome-backfill.sh
# against a temp ledger DB. Asserts UPDATE behaviour for the 4 outcomes,
# missing-marker no-op, bypass no-op, and missing-sqlite3 no-op.
set -u
HOOK="$HOME/.claude/hooks/agent-outcome-backfill.sh"
[ -x "$HOOK" ] || { echo "FAIL: hook not executable at $HOOK"; exit 1; }
TMP=$(mktemp -d)
trap 'rm -rf "$TMP"' EXIT
DB="$TMP/ledger.sqlite"
export KEI_LEDGER_DB="$DB"
sqlite3 "$DB" "CREATE TABLE agents (
id TEXT PRIMARY KEY,
outcome TEXT CHECK (outcome IN ('functional','partial','scaffolding','fail')),
stubs_count INTEGER DEFAULT 0
);"
PASS=0; FAIL=0
assert_eq() {
if [ "$1" = "$2" ]; then PASS=$((PASS+1));
else FAIL=$((FAIL+1)); echo " FAIL: $3 — got '$1' expected '$2'"; fi
}
run_case() {
# $1=id $2=shipped $3=stubs_count_in_marker
sqlite3 "$DB" "INSERT OR REPLACE INTO agents(id,outcome,stubs_count) VALUES('$1',NULL,0);"
BODY="prelude text
=== STATUS-TRUTH MARKER ===
shipped: $2
stubs: $3
cargo-check: PASS
behaviour-verified: yes
follow-up-required:
- none"
PAYLOAD=$(jq -nc --arg id "$1" --arg body "$BODY" \
'{tool_use_id:$id, tool_response:$body}')
printf '%s' "$PAYLOAD" | "$HOOK"
OUT=$(sqlite3 "$DB" "SELECT outcome||'|'||stubs_count FROM agents WHERE id='$1';")
assert_eq "$OUT" "$2|$3" "outcome=$2"
}
echo "[1] 4 valid outcomes update correctly"
run_case "id-func" "functional" 0
run_case "id-part" "partial" 3
run_case "id-scaf" "scaffolding" 7
run_case "id-fail" "fail" 12
echo "[2] idempotent re-run produces same row"
run_case "id-func" "functional" 0
echo "[3] missing marker → no-op"
sqlite3 "$DB" "INSERT OR REPLACE INTO agents(id,outcome,stubs_count) VALUES('id-bare',NULL,0);"
printf '%s' '{"tool_use_id":"id-bare","tool_response":"just a plain reply"}' | "$HOOK"
OUT=$(sqlite3 "$DB" "SELECT IFNULL(outcome,'NULL')||'|'||stubs_count FROM agents WHERE id='id-bare';")
assert_eq "$OUT" "NULL|0" "no-marker no-op"
echo "[4] bypass env → no-op"
sqlite3 "$DB" "INSERT OR REPLACE INTO agents(id,outcome,stubs_count) VALUES('id-byp',NULL,0);"
BODY="=== STATUS-TRUTH MARKER ===
shipped: functional
stubs: 0"
PAYLOAD=$(jq -nc --arg id "id-byp" --arg body "$BODY" '{tool_use_id:$id,tool_response:$body}')
printf '%s' "$PAYLOAD" | OUTCOME_BACKFILL_BYPASS=1 "$HOOK"
OUT=$(sqlite3 "$DB" "SELECT IFNULL(outcome,'NULL') FROM agents WHERE id='id-byp';")
assert_eq "$OUT" "NULL" "bypass no-op"
echo "[5] missing sqlite3 → no-op (PATH stripped)"
sqlite3 "$DB" "INSERT OR REPLACE INTO agents(id,outcome,stubs_count) VALUES('id-nosql',NULL,0);"
PAYLOAD=$(jq -nc --arg id "id-nosql" --arg body "$BODY" '{tool_use_id:$id,tool_response:$body}')
JQ_DIR=$(dirname "$(command -v jq)")
printf '%s' "$PAYLOAD" | env -i HOME="$HOME" KEI_LEDGER_DB="$DB" PATH="$JQ_DIR" "$HOOK" 2>/dev/null
OUT=$(sqlite3 "$DB" "SELECT IFNULL(outcome,'NULL') FROM agents WHERE id='id-nosql';")
assert_eq "$OUT" "NULL" "no-sqlite3 no-op"
echo "Passed: $PASS Failed: $FAIL"
[ "$FAIL" -eq 0 ] || exit 1