12-agent audit (2 waves Opus+Sonnet, 6 slices each) flagged 3 HIGH-tier
issues that BOTH waves agreed on, plus 5 doc-honesty findings. This
batch fixes the lot.
== CI green (was failing on main 1207cf5) ==
- _primitives/_rust/Cargo.toml — workspace tokio gains `io-std` feature
(needed by kei-mcp/src/main.rs which calls tokio::io::{stdin,stdout})
- _primitives/_rust/kei-mcp/Cargo.toml — dev-deps tokio gains `test-util`
feature (needed by tests/tools_call_timeout.rs for tokio::time::advance
and Builder::start_paused). Both verified locally:
`cargo check -p kei-mcp` ✓
`cargo test --no-run -p kei-mcp` ✓ (3 test binaries link)
[REAL: ran 2026-05-03 in this session]
== HIGH-tier audit fixes (consensus across waves) ==
1. SQLi escape in agent-outcome-backfill.sh:110
- 4 of 12 agents flagged: TOOL_USE_ID was JSON-derived and
interpolated raw into SQL. Allowlist on $SHIPPED protected today
but a future case-statement removal opened the surface.
- Fix: tiny `_sql_esc` helper that doubles single-quotes (SQL-99
standard escape), applied to SHIPPED + TOOL_USE_ID. STUBS already
integer-validated.
2. PRAGMA user_version=9 in install/sql/outcome-only-schema.sql
- W1 outcome-only critic flagged: the SQL fallback installed a
v9-equivalent flat schema but left user_version=0. A LATER
`kei-ledger init` (e.g. when user upgrades to full kit) would
re-run migrations v1-v9 and ALTER TABLE ADD COLUMN duplicate-error
mid-migration → broken DB.
- Fix: set PRAGMA user_version=9 before COMMIT so the binary's
migration runner sees current ≥ target and short-circuits.
3. backup_file mv→cp + uninstall macOS-portable awk
- W1+W2 outcome-only flagged: lib-backup.sh uses `mv` which DELETES
the target before _jq_merge_hooks runs; `|| true` swallowed the
subsequent jq read-error → silent settings.json loss.
- Fix in lib-profile-outcome-only.sh: `cp -p` aside, drop `|| true`,
return 1 on merge failure (trap restores).
- PROFILE-OUTCOME-ONLY.md uninstall used GNU sed `,+1` extension
which BSD sed (macOS) does not support — uninstall silently
no-op'd on macOS, leaving orphan CLAUDE.md text.
- Fix: replace with portable `awk` recipe; also added `rm -f` for
the agent-toolstats.jsonl sidecar (privacy completeness).
== Doc honesty pass (RULE 0.18 numerics + RULE 0.4 citations) ==
4. README.md count drift — verified all values against filesystem:
* 102→105 Rust crates (Cargo.toml workspace `members` count)
* 67→68 skills (`ls skills/ | wc -l`)
* 35→38 hooks (`grep -c '"command":' settings-snippet.json`)
* 37→38 agent manifests (`ls _manifests/*.toml | wc -l`)
* 82→85 substrate blocks (`find _blocks/ -name '*.md' | wc -l`)
* 18 capability atoms VERIFIED via `find _capabilities/ -name '*.md'`
(encyclopedia §3 row count of 17 is in a separate file and is a
known internal display issue, not changed in this commit)
* 495→565 active DNAs (per docs/DNA-INDEX.md header 2026-05-03)
Each value now carries a `[REAL: <command>]` style trailer per
RULE 0.18.
5. README.md DNA "80-char identity" → "≥33-char variable-length"
- W1+W2 reviewer-pass flagged FALSE: docs/DNA-FORMAT.md SSoT says
minimum 33 chars; 80 was nowhere in code or spec
- Fix in README.md:36 + docs/PHILOSOPHY.md:39 + docs/DNA-INDEX.md:1352
6. README.md "Eleven install profiles (... Cursor / Continue / Zed /
Aider / Docker / Nix)" — Cursor/Continue/Zed/Aider/Docker/Nix were
never install profiles, they were bridge targets
- Fix: list 12 actual profiles from _primitives/MANIFEST.toml,
mention bridges as separate concept
7. .claude-plugin/plugin.json license MIT → Apache-2.0
- W2-Sonnet reviewer flagged: LICENSE file is Apache-2.0 (since
2026-04-30 per NOTICE), but plugin.json still declared MIT —
plugin marketplace would show wrong license
8. docs/ARCHITECTURE.md:318 placeholder URL `https://example.invalid/...`
- W2-Sonnet reviewer flagged: dead link in published docs
- Fix: remove the bad href, describe ssl-rule-file as per-user
install outside the public repo
9. skills/sleep-on-it/SKILL.md Wagner et al. 2004 citation
- W1+W2 reviewer flagged RULE 0.4 violation: citation without
verification marker
- Fix: added [VERIFIED: doi:10.1038/nature02223] + clarification
that the original paper showed slow-wave-sleep (not strictly REM)
insight gain — our metaphor is a loose mapping
10. encyclopedia/substrate-overview.md §5 fabricated TS deps
- W1-Opus doc-consistency flagged RULE 0.4.b violation: 5 of 6
package rows had INVENTED dependency strings
(`recall-ai-sdk ^1.0.0`, `nodemailer-mock ^2.0.0`,
`telegram-typings ^4.10.0`, etc — none exist in the actual
package.json files)
- Fix: regenerated table from real `package.json` reads via
`node -p "require(...).dependencies"` for each of the 6 packages
- Fix: also corrected version drift (5 packages all 0.14.0 now)
Verification:
- Outcome-only end-to-end install against fake $HOME succeeds:
hooks installed, ledger schema at user_version=9, settings.json
created cleanly, all 5 documented files present
[REAL: ran 2026-05-03 in this session]
- `cargo check -p kei-mcp` + `cargo test --no-run -p kei-mcp` clean
Audit findings NOT yet addressed (deferred to next batch):
- README:65 git clone github URL — repo is private; reviewer flagged
external strangers cannot clone; will resolve via Quick Start rewrite
- npm.pkg.github.com / @keisei84 leftover sweep — both waves verified
ZERO refs, no fix needed
- safeEqual timing leak in TS server (W2 sec MEDIUM)
- HTTP server bind 0.0.0.0 (W2 sec MEDIUM)
- Unbounded request body (W2 ci MEDIUM)
- --dry-run silent ignored on non-outcome profiles (W1+W2 MEDIUM)
- Doc-link missing for MEMORY/DNA/LEDGER format specs from README
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
149 lines
6 KiB
Bash
Executable file
149 lines
6 KiB
Bash
Executable file
#!/bin/sh
|
|
# agent-outcome-backfill.sh — PostToolUse:Agent hook.
|
|
#
|
|
# Backfills `outcome` + `stubs_count` columns in kei-ledger after an Agent
|
|
# tool call completes, by parsing the STATUS-TRUTH MARKER block (RULE 0.16)
|
|
# emitted in the agent's final message.
|
|
#
|
|
# Closes the learning loop for kei-model-router: without an outcome signal
|
|
# the Beta posterior never converges and the router falls back to the top
|
|
# tier on every spawn. After ~10-20 invocations the prior becomes useful;
|
|
# after ~50 the router stops defaulting to Opus on unfamiliar tasks.
|
|
#
|
|
# Defensive: never blocks the tool call, never propagates errors, exits 0
|
|
# on every path. Bypass via `OUTCOME_BACKFILL_BYPASS=1`.
|
|
#
|
|
# Production payload shape (verified 2026-05-01 against real Claude Code
|
|
# PostToolUse:Agent stdin):
|
|
# .tool_use_id — string, matches agents.id in kei-ledger
|
|
# .tool_response — object with `.content` (array of {type,text} blocks)
|
|
# plus prompt / status / agentId / agentType / usage etc
|
|
# The `.tool_response.content[*].text` strings carry the agent's final
|
|
# message — that's where the STATUS-TRUTH MARKER lives.
|
|
set -u
|
|
|
|
# Optional debug log. Toggle via `AGENT_OUTCOME_DEBUG=1` for diagnostics
|
|
# when the hook stops firing for some reason. Disabled by default to keep
|
|
# the production path cheap and silent.
|
|
if [ "${AGENT_OUTCOME_DEBUG:-0}" = "1" ]; then
|
|
LOG="$HOME/.claude/agent-outcome-backfill.log"
|
|
PAYLOAD_DBG=$(cat 2>/dev/null || true)
|
|
printf '[%s] invoked, payload-len=%d\n' \
|
|
"$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
|
|
"${#PAYLOAD_DBG}" \
|
|
>> "$LOG" 2>&1 || true
|
|
else
|
|
PAYLOAD_DBG=$(cat 2>/dev/null || true)
|
|
fi
|
|
|
|
# Bypass.
|
|
if [ "${OUTCOME_BACKFILL_BYPASS:-0}" = "1" ]; then
|
|
exit 0
|
|
fi
|
|
|
|
# Tool dependencies — silent no-op if missing.
|
|
command -v jq >/dev/null 2>&1 || exit 0
|
|
command -v sqlite3 >/dev/null 2>&1 || exit 0
|
|
|
|
DB="${KEI_LEDGER_DB:-$HOME/.claude/agents/ledger.sqlite}"
|
|
[ -f "$DB" ] || exit 0
|
|
|
|
PAYLOAD="$PAYLOAD_DBG"
|
|
[ -n "$PAYLOAD" ] || exit 0
|
|
|
|
# Extract tool_use_id (top-level or nested).
|
|
TOOL_USE_ID=$(printf '%s' "$PAYLOAD" | jq -r '.tool_use_id // .toolUseId // empty' 2>/dev/null || true)
|
|
[ -n "$TOOL_USE_ID" ] || exit 0
|
|
|
|
# Extract the agent's final message text. Recursively flattens whatever
|
|
# tool_response shape Claude Code happens to use:
|
|
# string → return as-is
|
|
# array of strings/objects → flatten each, join with newlines
|
|
# object with `.text` → return .text
|
|
# object with `.content` (array) → recurse into content
|
|
# anything else → empty (hook exits below)
|
|
#
|
|
# Verified against the production shape: tool_response is an object with
|
|
# .content[0].text holding the agent's reply. The flatten function reaches
|
|
# the .text field via the content recursion.
|
|
RESPONSE=$(printf '%s' "$PAYLOAD" | jq -r '
|
|
(.tool_response // .toolResponse // "") as $r
|
|
| def flatten:
|
|
if type == "string" then .
|
|
elif type == "array" then map(flatten) | join("\n")
|
|
elif type == "object" then
|
|
if has("text") then .text
|
|
elif has("content") then .content | flatten
|
|
else (. | tostring) end
|
|
else "" end;
|
|
$r | flatten
|
|
' 2>/dev/null || true)
|
|
[ -n "$RESPONSE" ] || exit 0
|
|
|
|
# Locate the STATUS-TRUTH MARKER block. Absent marker is a normal case
|
|
# (read-only / research agents do not emit one) — silent no-op.
|
|
printf '%s' "$RESPONSE" | grep -q '=== STATUS-TRUTH MARKER ===' 2>/dev/null || exit 0
|
|
|
|
# Parse `shipped:` — first match wins, lowercased + trimmed first word.
|
|
SHIPPED=$(printf '%s' "$RESPONSE" \
|
|
| grep -m1 '^shipped:' \
|
|
| sed 's/^shipped:[[:space:]]*//' \
|
|
| awk '{print tolower($1)}' 2>/dev/null || true)
|
|
|
|
# Validate against ledger CHECK constraint domain.
|
|
case "$SHIPPED" in
|
|
functional|partial|scaffolding|fail) ;;
|
|
*) exit 0 ;;
|
|
esac
|
|
|
|
# Parse `stubs:` count — first integer on the line, default 0.
|
|
STUBS=$(printf '%s' "$RESPONSE" \
|
|
| grep -m1 '^stubs:' \
|
|
| sed 's/^stubs:[[:space:]]*//' \
|
|
| grep -oE '[0-9]+' \
|
|
| head -1 2>/dev/null || true)
|
|
[ -n "$STUBS" ] || STUBS=0
|
|
|
|
# Idempotent UPDATE. Failure (locked DB, no row, etc.) → advisory only,
|
|
# never blocks the originating tool call.
|
|
#
|
|
# Audit fix 2026-05-03 (RULE 0.4 / SQLi): TOOL_USE_ID is unsanitised JSON
|
|
# input (potential `'` injection); SHIPPED is allowlist-validated above
|
|
# but defensive escape costs nothing. Replace single-quote with two
|
|
# single-quotes (SQL-standard escape) for ALL string-context variables.
|
|
# STUBS is integer-validated by `grep -oE '[0-9]+'` — already safe.
|
|
_sql_esc() { printf "%s" "$1" | sed "s/'/''/g"; }
|
|
SHIPPED_ESC=$(_sql_esc "$SHIPPED")
|
|
TOOL_USE_ID_ESC=$(_sql_esc "$TOOL_USE_ID")
|
|
sqlite3 "$DB" \
|
|
"UPDATE agents SET outcome='$SHIPPED_ESC', stubs_count=$STUBS WHERE id='$TOOL_USE_ID_ESC';" \
|
|
2>/dev/null || {
|
|
printf '[agent-outcome-backfill] UPDATE failed for id=%s\n' "$TOOL_USE_ID" >&2
|
|
exit 0
|
|
}
|
|
|
|
# Sidecar journal: capture toolStats / totalToolUseCount / totalDurationMs
|
|
# for tool-call-pattern analysis. Lives outside the ledger schema so we
|
|
# don't need a migration on every payload-shape change. Append-only JSONL.
|
|
TOOLSTATS_JSONL="$HOME/.claude/memory/time-metrics/agent-toolstats.jsonl"
|
|
mkdir -p "$(dirname "$TOOLSTATS_JSONL")" 2>/dev/null || true
|
|
printf '%s' "$PAYLOAD" | jq -c \
|
|
--arg id "$TOOL_USE_ID" \
|
|
--arg outcome "$SHIPPED" \
|
|
--arg stubs "$STUBS" \
|
|
'{
|
|
agent_id: $id,
|
|
outcome: $outcome,
|
|
stubs: ($stubs | tonumber),
|
|
ts: now | floor,
|
|
tool_use_count: (.tool_response.totalToolUseCount // null),
|
|
duration_ms: (.tool_response.totalDurationMs // null),
|
|
tool_stats: (.tool_response.toolStats // null),
|
|
tokens_in: (.tool_response.usage.input_tokens // null),
|
|
tokens_out: (.tool_response.usage.output_tokens // null),
|
|
cache_read: (.tool_response.usage.cache_read_input_tokens // null),
|
|
cache_write: (.tool_response.usage.cache_creation_input_tokens // null)
|
|
}' \
|
|
>> "$TOOLSTATS_JSONL" 2>/dev/null || true
|
|
|
|
exit 0
|