KeiSeiKit-1.0/hooks
Parfii-bot b332b571bf feat(tracking): close 3 last observability gaps — toolStats + skill-record + numeric-claims journal
Closes the loop on "without full tracking the system can't make decisions"
(user pushback on partial coverage). Three gaps that left the inference
layer blind are now wired:

GAP #1 — agent toolStats / token counts / cache hits captured
================================================================
`agent-outcome-backfill.sh` now appends one JSONL row per spawn to
`~/.claude/memory/time-metrics/agent-toolstats.jsonl` with:
  agent_id, outcome, stubs, ts,
  tool_use_count, duration_ms, tool_stats {Read:N, Bash:M, ...},
  tokens_in, tokens_out, cache_read, cache_write
Sidecar journal (no schema migration). Production payload's
.tool_response.totalToolUseCount / totalDurationMs / toolStats / usage
fields land directly. Smoke-tested with synthetic spawn — row written.

GAP #2 — skill_invocations table actually receives writes
================================================================
The `skill_invocations` table (schema v8) had 0 rows because no caller
existed for `skill_metrics::record_invocation`. Added two pieces:

(a) `kei-ledger record-skill <name> --success {0|1}` CLI subcommand
    Mirrors record-cost; same dispatch shape. Optional `--agent-id`,
    `--trajectory-id`, `--duration-ms`, `--db`. Validates non-empty
    name + duration ≥ 0. Outputs `{"ok":true,"skill":"...","ts":N}`.

(b) `hooks/skill-record.sh` — PostToolUse:Skill hook. 50 LOC POSIX.
    Detects Skill tool calls, derives success heuristic from
    tool_response (exit_code / status / content non-empty), shells
    out to `kei-ledger record-skill`. Bypass via SKILL_RECORD_BYPASS=1.

83 kei-ledger tests pass (16 unit + 67 integration). Smoke-tested
end-to-end: `kei-ledger record-skill test-skill --success 1` inserts
a row with correct fields.

Phase D nightly skill-metrics decisions (archive if unused N days,
re-extract if success<60% over M days, validated if >20 calls + >90%
success) now have data to consume.

GAP #3 — numeric-claims.jsonl receives every evidence-tagged claim
================================================================
RULE 0.18 mandated three markers `[REAL:]` / `[FROM-JOURNAL:]` /
`[ESTIMATE-HTC:]` on every numeric/duration/cost claim, but no hook
appended valid claims to the journal — the calibration data RULE 0.18
promised never accumulated.

`hooks/numeric-claims-record.sh` — Stop hook, 140 LOC POSIX. Reads
transcript_path from stdin, locates the last assistant message via
recursive flatten (same pattern as agent-outcome-backfill.sh after
the production-payload-shape fix), regex-extracts every `<phrase>
[<TIER>: <pointer>]` triple, appends one JSONL row per claim.

Idempotent within 1-second window to avoid double-recording on
repeat Stop fires. Bypass via NUMERIC_CLAIMS_RECORD_BYPASS=1.

Smoke test: synthetic transcript with 3 markers (REAL + ESTIMATE-HTC
+ FROM-JOURNAL) produced exactly 3 well-formed JSONL rows.

Settings.json
================================================================
- PostToolUse:Skill matcher created (or augmented if already
  present) with skill-record.sh.
- Stop:* matcher gains numeric-claims-record.sh after the existing
  chain (stop-verify, task-timer, session-end-dump, extract-task-
  durations, chat-numeric-postflag, affect-threshold-check,
  enrich-from-jsonl).

What this does NOT do (deferred):
  - Backfill `skill_invocations` from past traces (history started
    today; Phase D cohort builds forward from now).
  - Migrate the agent toolStats sidecar JSONL into a proper ledger
    column. Append-only file is fine for the current scale.
  - Refactor main.rs (now 233 LOC, was 212; pre-existing CP debt
    flagged by skill-record agent — separate cleanup PR).

=== STATUS-TRUTH MARKER ===
shipped: functional
stubs: 0
cargo-check: PASS
behaviour-verified: yes
follow-up-required:
  - kei-ledger main.rs Constructor Pattern split (212→233 LOC)
  - Verify in next session: skill_invocations gets rows from real
    Skill tool use; numeric-claims.jsonl gets rows from real assistant
    messages with markers

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 03:42:09 +08:00
..
_lib KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
affect-live-scan.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
agent-capability-check.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
agent-capability-verify.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
agent-fork-done.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
agent-fork-logger.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
agent-heartbeat-tick.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
agent-outcome-backfill.sh feat(tracking): close 3 last observability gaps — toolStats + skill-record + numeric-claims journal 2026-05-02 03:42:09 +08:00
agent-stub-scan.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
alignment-check.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
assemble-agents.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
assemble-validate.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
auto-dev-guard.sh feat(frontend-loop): kei-db-contract primitive + frontend-validator agent + auto-dev-guard hook 2026-05-01 15:34:39 +08:00
auto-encyclopedia-refresh.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
auto-register-on-edit.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
block-dangerous.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
check-error-patterns.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
citation-verify.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
decompose-rules-on-edit.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
destructive-guard.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
disk-headroom-check.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
disk-reclaim.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
error-spike-detector.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
extract-task-durations.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
hooks.json feat(frontend-loop): kei-db-contract primitive + frontend-validator agent + auto-dev-guard hook 2026-05-01 15:34:39 +08:00
milestone-commit-hook.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
no-downgrade.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
no-hand-edit-agents.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
no-python-without-approval.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
numeric-claims-guard.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
numeric-claims-record.sh feat(tracking): close 3 last observability gaps — toolStats + skill-record + numeric-claims journal 2026-05-02 03:42:09 +08:00
orchestrator-branch-check.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
orchestrator-dirty-check.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
phase-b-rem.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
post-commit-audit.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
post-write-check.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
recurrence-suggest.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
rust-first.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
safety-guard.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
session-end-dump.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
site-wysiwyd-check.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
skill-record.sh feat(tracking): close 3 last observability gaps — toolStats + skill-record + numeric-claims journal 2026-05-02 03:42:09 +08:00
stop-verify.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
task-timer.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00
tomd-preread.sh KeiSeiKit-public — clean state 2026-05-01 12:09:03 +08:00