Closes the loop on "without full tracking the system can't make decisions" (user pushback on partial coverage). Three gaps that left the inference layer blind are now wired: GAP #1 — agent toolStats / token counts / cache hits captured ================================================================ `agent-outcome-backfill.sh` now appends one JSONL row per spawn to `~/.claude/memory/time-metrics/agent-toolstats.jsonl` with: agent_id, outcome, stubs, ts, tool_use_count, duration_ms, tool_stats {Read:N, Bash:M, ...}, tokens_in, tokens_out, cache_read, cache_write Sidecar journal (no schema migration). Production payload's .tool_response.totalToolUseCount / totalDurationMs / toolStats / usage fields land directly. Smoke-tested with synthetic spawn — row written. GAP #2 — skill_invocations table actually receives writes ================================================================ The `skill_invocations` table (schema v8) had 0 rows because no caller existed for `skill_metrics::record_invocation`. Added two pieces: (a) `kei-ledger record-skill <name> --success {0|1}` CLI subcommand Mirrors record-cost; same dispatch shape. Optional `--agent-id`, `--trajectory-id`, `--duration-ms`, `--db`. Validates non-empty name + duration ≥ 0. Outputs `{"ok":true,"skill":"...","ts":N}`. (b) `hooks/skill-record.sh` — PostToolUse:Skill hook. 50 LOC POSIX. Detects Skill tool calls, derives success heuristic from tool_response (exit_code / status / content non-empty), shells out to `kei-ledger record-skill`. Bypass via SKILL_RECORD_BYPASS=1. 83 kei-ledger tests pass (16 unit + 67 integration). Smoke-tested end-to-end: `kei-ledger record-skill test-skill --success 1` inserts a row with correct fields. Phase D nightly skill-metrics decisions (archive if unused N days, re-extract if success<60% over M days, validated if >20 calls + >90% success) now have data to consume. GAP #3 — numeric-claims.jsonl receives every evidence-tagged claim ================================================================ RULE 0.18 mandated three markers `[REAL:]` / `[FROM-JOURNAL:]` / `[ESTIMATE-HTC:]` on every numeric/duration/cost claim, but no hook appended valid claims to the journal — the calibration data RULE 0.18 promised never accumulated. `hooks/numeric-claims-record.sh` — Stop hook, 140 LOC POSIX. Reads transcript_path from stdin, locates the last assistant message via recursive flatten (same pattern as agent-outcome-backfill.sh after the production-payload-shape fix), regex-extracts every `<phrase> [<TIER>: <pointer>]` triple, appends one JSONL row per claim. Idempotent within 1-second window to avoid double-recording on repeat Stop fires. Bypass via NUMERIC_CLAIMS_RECORD_BYPASS=1. Smoke test: synthetic transcript with 3 markers (REAL + ESTIMATE-HTC + FROM-JOURNAL) produced exactly 3 well-formed JSONL rows. Settings.json ================================================================ - PostToolUse:Skill matcher created (or augmented if already present) with skill-record.sh. - Stop:* matcher gains numeric-claims-record.sh after the existing chain (stop-verify, task-timer, session-end-dump, extract-task- durations, chat-numeric-postflag, affect-threshold-check, enrich-from-jsonl). What this does NOT do (deferred): - Backfill `skill_invocations` from past traces (history started today; Phase D cohort builds forward from now). - Migrate the agent toolStats sidecar JSONL into a proper ledger column. Append-only file is fine for the current scale. - Refactor main.rs (now 233 LOC, was 212; pre-existing CP debt flagged by skill-record agent — separate cleanup PR). === STATUS-TRUTH MARKER === shipped: functional stubs: 0 cargo-check: PASS behaviour-verified: yes follow-up-required: - kei-ledger main.rs Constructor Pattern split (212→233 LOC) - Verify in next session: skill_invocations gets rows from real Skill tool use; numeric-claims.jsonl gets rows from real assistant messages with markers Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
140 lines
5 KiB
Bash
Executable file
140 lines
5 KiB
Bash
Executable file
#!/bin/sh
|
|
# numeric-claims-record.sh — Stop event hook (RULE 0.18).
|
|
#
|
|
# Scans the last assistant message for evidence markers
|
|
# ([REAL: ...] / [FROM-JOURNAL: ...] / [ESTIMATE-HTC: ...])
|
|
# and appends one JSONL row per marker to numeric-claims.jsonl.
|
|
#
|
|
# Severity: observability (exit 0 on every path).
|
|
# Bypass: NUMERIC_CLAIMS_RECORD_BYPASS=1
|
|
|
|
[ "${NUMERIC_CLAIMS_RECORD_BYPASS:-0}" = "1" ] && exit 0
|
|
command -v jq >/dev/null 2>&1 || exit 0
|
|
|
|
JOURNAL="$HOME/.claude/memory/time-metrics/numeric-claims.jsonl"
|
|
mkdir -p "$(dirname "$JOURNAL")" 2>/dev/null || exit 0
|
|
|
|
INPUT=$(cat 2>/dev/null || true)
|
|
[ -z "$INPUT" ] && exit 0
|
|
|
|
# Extract transcript_path from Stop event stdin.
|
|
TRANSCRIPT=$(printf '%s' "$INPUT" | jq -r '.transcript_path // empty' 2>/dev/null || true)
|
|
|
|
# Fallback: session_id → traces directory (used by chat-numeric-postflag).
|
|
if [ -z "$TRANSCRIPT" ] || [ ! -f "$TRANSCRIPT" ]; then
|
|
SESSION_ID=$(printf '%s' "$INPUT" | jq -r '.session_id // empty' 2>/dev/null || true)
|
|
[ -n "$SESSION_ID" ] && TRANSCRIPT="$HOME/.claude/memory/traces/${SESSION_ID}.jsonl"
|
|
fi
|
|
|
|
[ -f "$TRANSCRIPT" ] || exit 0
|
|
|
|
# session_id for JSONL row — basename without extension.
|
|
SESSION_ID=$(basename "$TRANSCRIPT" .jsonl)
|
|
|
|
# Find the last assistant message line in the transcript JSONL.
|
|
# Match both "type":"assistant" (actual traces) and "role":"assistant" (test/simple format).
|
|
# tac reverses lines; use tail -r on BSD/macOS if tac unavailable.
|
|
LAST_MSG=""
|
|
if command -v tac >/dev/null 2>&1; then
|
|
LAST_MSG=$(tac "$TRANSCRIPT" 2>/dev/null \
|
|
| grep -m1 '"type":"assistant"\|"role":"assistant"' || true)
|
|
else
|
|
LAST_MSG=$(tail -r "$TRANSCRIPT" 2>/dev/null \
|
|
| grep -m1 '"type":"assistant"\|"role":"assistant"' || true)
|
|
fi
|
|
[ -n "$LAST_MSG" ] || exit 0
|
|
|
|
# Flatten text content from the message. Handles two transcript shapes:
|
|
# Real traces: .message.content[*].text (nested under .message)
|
|
# Simple/test: .content[*].text (top-level .content)
|
|
# The recursive flatten walks both.
|
|
TEXT=$(printf '%s' "$LAST_MSG" | jq -r '
|
|
def flatten:
|
|
if type == "string" then .
|
|
elif type == "array" then map(flatten) | join("\n")
|
|
elif type == "object" then
|
|
if has("text") then .text
|
|
elif has("content") then .content | flatten
|
|
else (. | tostring) end
|
|
else "" end;
|
|
(.message // .) | flatten
|
|
' 2>/dev/null || true)
|
|
[ -n "$TEXT" ] || exit 0
|
|
|
|
NOW_ISO=$(date -u +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || true)
|
|
[ -n "$NOW_ISO" ] || exit 0
|
|
|
|
# append_row: write one JSONL record to the journal.
|
|
# Idempotency guard: skip if last journal row has same value+ts.
|
|
append_row() {
|
|
_tier="$1" # REAL | FROM-JOURNAL | ESTIMATE-HTC
|
|
_value="$2" # context before marker (trimmed)
|
|
_pointer="$3" # marker content
|
|
|
|
if [ -f "$JOURNAL" ]; then
|
|
_last_ts=$(tail -1 "$JOURNAL" 2>/dev/null | jq -r '.ts // empty' 2>/dev/null || true)
|
|
_last_val=$(tail -1 "$JOURNAL" 2>/dev/null | jq -r '.value // empty' 2>/dev/null || true)
|
|
if [ "$_last_val" = "$_value" ] && [ "$_last_ts" = "$NOW_ISO" ]; then
|
|
return
|
|
fi
|
|
fi
|
|
|
|
jq -c -n \
|
|
--arg kind "claim" \
|
|
--arg value "$_value" \
|
|
--arg tier "$_tier" \
|
|
--arg pointer "$_pointer" \
|
|
--arg ts "$NOW_ISO" \
|
|
--arg sid "$SESSION_ID" \
|
|
'{"kind":$kind,"value":$value,"evidence_tier":$tier,"pointer":$pointer,"ts":$ts,"session_id":$sid}' \
|
|
>> "$JOURNAL" 2>/dev/null || true
|
|
}
|
|
|
|
# Scan TEXT for all three marker types using awk.
|
|
# Each JSONL line is processed as a single record (default RS="\n").
|
|
# For each marker found: emit "TIER|value_context|pointer" to stdout.
|
|
MATCHES=$(printf '%s' "$TEXT" | awk '
|
|
{
|
|
line = $0
|
|
while (1) {
|
|
r = index(line, "[REAL:")
|
|
j = index(line, "[FROM-JOURNAL:")
|
|
e = index(line, "[ESTIMATE-HTC:")
|
|
|
|
pos = 0; tier = ""
|
|
if (r > 0 && (pos == 0 || r < pos)) { pos = r; tier = "REAL" }
|
|
if (j > 0 && (pos == 0 || j < pos)) { pos = j; tier = "FROM-JOURNAL" }
|
|
if (e > 0 && (pos == 0 || e < pos)) { pos = e; tier = "ESTIMATE-HTC" }
|
|
if (tier == "") break
|
|
|
|
# Value context: up to 60 chars before the marker.
|
|
start = pos - 60; if (start < 1) start = 1
|
|
value = substr(line, start, pos - start)
|
|
gsub(/^[[:space:]]+|[[:space:]]+$/, "", value)
|
|
|
|
# Pointer: content inside brackets.
|
|
tail_str = substr(line, pos)
|
|
close_pos = index(tail_str, "]")
|
|
if (close_pos == 0) { line = substr(line, pos + 1); continue }
|
|
inner = substr(tail_str, 2, close_pos - 2) # strip outer [ ]
|
|
prefix = tier ": "
|
|
if (substr(inner, 1, length(prefix)) == prefix) {
|
|
inner = substr(inner, length(prefix) + 1)
|
|
}
|
|
gsub(/^[[:space:]]+|[[:space:]]+$/, "", inner)
|
|
|
|
print tier "|" value "|" inner
|
|
|
|
line = substr(line, pos + close_pos)
|
|
}
|
|
}
|
|
')
|
|
|
|
[ -n "$MATCHES" ] || exit 0
|
|
|
|
printf '%s\n' "$MATCHES" | while IFS='|' read -r tier value pointer; do
|
|
[ -n "$tier" ] || continue
|
|
append_row "$tier" "$value" "$pointer"
|
|
done
|
|
|
|
exit 0
|