From 9a3db14b90455ae85411a003ae0c9783c9bd830b Mon Sep 17 00:00:00 2001
From: Parfii-bot <parfionovichd@icloud.com>
Date: Sat, 2 May 2026 04:02:28 +0800
Subject: [PATCH] feat(sleep-sync): mirror time-metrics + ledger snapshots,
 surface in Phase B report
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

User pushback: "что теперь делает сон? все связано?" — Sleep Phase B
was reading only `traces/`, ignoring the four tracking journals shipped
in the previous commit. Cloud agent had a partial view of what happened.

This commit closes the loop. Sleep now sees everything that's tracked.

PUSH SIDE — `kei-sleep-sync.sh` (called on every Stop event)

Now mirrors the full observability surface into the memory-repo:

  ~/.claude/memory/time-metrics/sessions.jsonl       → time-metrics/
  ~/.claude/memory/time-metrics/tasks.jsonl          → time-metrics/
  ~/.claude/memory/time-metrics/numeric-claims.jsonl → time-metrics/
  ~/.claude/memory/time-metrics/agent-toolstats.jsonl→ time-metrics/
  ~/.claude/agents/ledger.sqlite agents table        → ledger/agents.jsonl
  ~/.claude/agents/ledger.sqlite skill_invocations   → ledger/skill_invocations.jsonl

Format: JSONL (one row per object). The two ledger tables are dumped
via `sqlite3 + json_object()` so cloud agents can stream-parse into
pandas / duckdb without binary-file handling.

First sync moved 6 files / 638 rows from local to remote — verified
by `git show --stat` of the resulting `memory: session traces` commit.

CONSUME SIDE — `phase-b-rem.sh` REM-consolidation report

Each nightly `reports/sleep-YYYY-MM-DD.md` now ends with a "Tracking
observability (last 7 days)" section containing four jq-aggregated
digests:

  1. Agent outcomes — per-model: n, functional/partial/scaffolding/fail
     counts + total_cost_usd. Lets the agent see whether the model-tier
     refactor (cb1fdde) actually paid off and whether Sonnet success
     rate justifies routing more task classes to it.

  2. Skill success rates — per-skill: n, successes, rate_pct. Drives
     Phase D nightly decisions (archive unused / re-extract failing /
     mark validated). Empty until Skill tool is invoked in the next
     session.

  3. Numeric-claims tier breakdown — REAL / FROM-JOURNAL / ESTIMATE-HTC
     counts. High ESTIMATE-HTC ratio = orchestrator under-calibrated.
     Cloud agent's job: spot frequent ESTIMATE-HTC categories and
     propose conversion to FROM-JOURNAL via measured runs.

  4. Agent tool-call patterns — mean tool_use_count, mean duration_ms,
     per-tool total calls. Lets the agent see "this code-implementer
     spawn made 30 Read but 1 Edit — was tier-allocation correct?".

All four sections gracefully skip if the source JSONL is missing or
empty. jq is the only new dependency (already present per existing
phase-b checks).

What is NOT yet automated:

  - The cloud agent's prompt template doesn't yet INSTRUCT it to act
    on these digests. Currently the digest is data; whether the agent
    proposes rule + hook codification based on it depends on the
    free-text instructions in the schedule. Follow-up: codify a Phase B
    instruction block that maps each digest to a recommendation pattern.

  - Idempotency on `cp` for time-metrics: I use plain `cp` (not `cp -n`)
    so the latest local state always overwrites remote. The journals are
    append-only on the local side, so this is safe — but if two machines
    ever share one memory-repo it would corrupt. Out of scope for
    single-machine setup.

=== STATUS-TRUTH MARKER ===
shipped: functional
stubs: 0
cargo-check: NOT-RUN  (pure shell)
behaviour-verified: yes
follow-up-required:
  - Phase B prompt template — instruct cloud agent to act on the four
    digests (codify recurring patterns, calibrate ESTIMATE-HTC).
  - skill_invocations.jsonl will populate from next session onward.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 _primitives/kei-sleep-sync.sh | 50 ++++++++++++++++++++-
 hooks/phase-b-rem.sh          | 83 +++++++++++++++++++++++++++++++++++
 2 files changed, 132 insertions(+), 1 deletion(-)

diff --git a/_primitives/kei-sleep-sync.sh b/_primitives/kei-sleep-sync.sh
index 25de59e..4bf81f5 100755
--- a/_primitives/kei-sleep-sync.sh
+++ b/_primitives/kei-sleep-sync.sh
@@ -51,7 +51,55 @@ if [ -d "$TRACES_SRC" ]; then
     cp -n "$TRACES_SRC"/*.jsonl traces/ 2>/dev/null || true
 fi
 
-git add traces/ backlog.md 2>/dev/null || { log_err "git add failed"; exit 0; }
+# Mirror time-metrics journals (RULE 0.18 + post-2026-05-02 tracking).
+# Append-only JSONL, OK to overwrite remote with local since local is the
+# source-of-truth for this user's machine. Source files:
+#   sessions.jsonl        — RULE 0.18 session-duration journal
+#   tasks.jsonl           — task-timer.sh per-Agent durations
+#   numeric-claims.jsonl  — RULE 0.18 evidence-tagged claims
+#   agent-toolstats.jsonl — agent-outcome-backfill.sh sidecar
+TIME_METRICS_SRC="${HOME}/.claude/memory/time-metrics"
+if [ -d "$TIME_METRICS_SRC" ]; then
+    mkdir -p time-metrics 2>/dev/null || true
+    for f in sessions.jsonl tasks.jsonl numeric-claims.jsonl agent-toolstats.jsonl; do
+        if [ -f "$TIME_METRICS_SRC/$f" ]; then
+            cp "$TIME_METRICS_SRC/$f" "time-metrics/$f" 2>/dev/null || true
+        fi
+    done
+fi
+
+# Snapshot kei-ledger: agents + skill_invocations as JSONL (sqlite3 .dump
+# has too much noise + is binary-ordering-sensitive). Cloud agents can
+# stream-parse JSONL straight into pandas/duckdb for analysis.
+LEDGER_DB="${KEI_LEDGER_DB:-${HOME}/.claude/agents/ledger.sqlite}"
+if [ -f "$LEDGER_DB" ] && command -v sqlite3 >/dev/null 2>&1; then
+    mkdir -p ledger 2>/dev/null || true
+    # `-newline` mode + `-cmd .mode json` would be cleaner but isn't
+    # universally available; emit one-row-per-line JSON via select+json_object.
+    sqlite3 "$LEDGER_DB" \
+        "SELECT json_object(
+            'id', id, 'branch', branch, 'parent_branch', parent_branch,
+            'spec_sha', spec_sha, 'status', status,
+            'started_ts', started_ts, 'finished_ts', finished_ts,
+            'summary', summary, 'worktree_path', worktree_path,
+            'dna', dna, 'creator_id', creator_id, 'fork_parent_id', fork_parent_id,
+            'cost_micro_cents', cost_micro_cents, 'provider', provider,
+            'model', model, 'tokens_in', tokens_in, 'tokens_out', tokens_out,
+            'stubs_count', stubs_count, 'outcome', outcome,
+            'escalation_depth', escalation_depth, 'task_class_dna', task_class_dna
+         ) FROM agents ORDER BY started_ts" \
+        > ledger/agents.jsonl 2>/dev/null || true
+    sqlite3 "$LEDGER_DB" \
+        "SELECT json_object(
+            'id', id, 'skill_name', skill_name, 'ts', ts,
+            'agent_id', agent_id, 'success', success,
+            'trajectory_id', trajectory_id, 'duration_ms', duration_ms
+         ) FROM skill_invocations ORDER BY ts" \
+        > ledger/skill_invocations.jsonl 2>/dev/null || true
+fi
+
+git add traces/ backlog.md time-metrics/ ledger/ 2>/dev/null \
+    || { log_err "git add failed"; exit 0; }
 
 # Nothing staged — silent exit.
 if git diff --cached --quiet 2>/dev/null; then
diff --git a/hooks/phase-b-rem.sh b/hooks/phase-b-rem.sh
index ee46595..143694a 100755
--- a/hooks/phase-b-rem.sh
+++ b/hooks/phase-b-rem.sh
@@ -118,6 +118,89 @@ REPORT="reports/sleep-$TODAY.md"
   head -100 "$PATTERNS_OUT"
   echo '```'
   echo
+  # ----- Per-axis observability digests (added 2026-05-02) -----
+  # Cloud agents and morning human review get an actionable rollup of the
+  # tracking journals without having to parse multi-thousand-line JSONL.
+  if [ -d "ledger" ] || [ -d "time-metrics" ]; then
+    echo "## Tracking observability (last 7 days)"
+    echo
+  fi
+
+  if [ -f "ledger/agents.jsonl" ] && [ -s "ledger/agents.jsonl" ] \
+       && command -v jq >/dev/null 2>&1; then
+    echo "### Agent outcomes — ledger/agents.jsonl"
+    echo '```'
+    SEVEN_DAYS_AGO=$(( $(date +%s) - 7*86400 ))
+    jq -s --argjson cutoff "$SEVEN_DAYS_AGO" '
+      [.[] | select(.started_ts >= $cutoff)]
+      | group_by(.model) | map({
+          model: .[0].model,
+          n: length,
+          functional: ([.[] | select(.outcome=="functional")] | length),
+          partial: ([.[] | select(.outcome=="partial")] | length),
+          scaffolding: ([.[] | select(.outcome=="scaffolding")] | length),
+          fail: ([.[] | select(.outcome=="fail")] | length),
+          unknown: ([.[] | select(.outcome==null or .outcome=="")] | length),
+          total_cost_usd: (([.[] | .cost_micro_cents // 0] | add) / 100000000)
+        })' ledger/agents.jsonl 2>/dev/null | head -100
+    echo '```'
+    echo
+  fi
+
+  if [ -f "ledger/skill_invocations.jsonl" ] && [ -s "ledger/skill_invocations.jsonl" ] \
+       && command -v jq >/dev/null 2>&1; then
+    echo "### Skill success rates — ledger/skill_invocations.jsonl"
+    echo '```'
+    SEVEN_DAYS_AGO=$(( $(date +%s) - 7*86400 ))
+    jq -s --argjson cutoff "$SEVEN_DAYS_AGO" '
+      [.[] | select(.ts >= $cutoff)]
+      | group_by(.skill_name) | map({
+          skill: .[0].skill_name,
+          n: length,
+          successes: ([.[] | select(.success==1)] | length),
+          rate_pct: ((([.[] | select(.success==1)] | length) * 100) / length)
+        }) | sort_by(.n) | reverse' ledger/skill_invocations.jsonl 2>/dev/null | head -50
+    echo '```'
+    echo
+  fi
+
+  if [ -f "time-metrics/numeric-claims.jsonl" ] \
+       && [ -s "time-metrics/numeric-claims.jsonl" ] \
+       && command -v jq >/dev/null 2>&1; then
+    echo "### Numeric-claims tier breakdown — time-metrics/numeric-claims.jsonl"
+    echo '```'
+    jq -s 'group_by(.evidence_tier) | map({tier: .[0].evidence_tier, n: length})' \
+        time-metrics/numeric-claims.jsonl 2>/dev/null
+    echo '```'
+    echo "_RULE 0.18 health: high ESTIMATE-HTC ratio = orchestrator under-calibrated. Cloud agent should propose converting frequent ESTIMATE-HTC categories into FROM-JOURNAL via measured runs._"
+    echo
+  fi
+
+  if [ -f "time-metrics/agent-toolstats.jsonl" ] \
+       && [ -s "time-metrics/agent-toolstats.jsonl" ] \
+       && command -v jq >/dev/null 2>&1; then
+    echo "### Agent tool-call patterns — time-metrics/agent-toolstats.jsonl"
+    echo '```'
+    jq -s '
+      [.[] | select(.tool_stats != null)] as $rows
+      | ($rows | length) as $n
+      | {
+          n_with_stats: $n,
+          mean_tool_uses: (if $n == 0 then 0
+                          else (($rows | map(.tool_use_count // 0) | add) / $n) end),
+          mean_duration_ms: (if $n == 0 then 0
+                            else (($rows | map(.duration_ms // 0) | add) / $n) end),
+          tool_distribution: (
+            [$rows[] | .tool_stats // {} | to_entries[]]
+            | group_by(.key)
+            | map({tool: .[0].key, total_calls: ([.[] | .value] | add)})
+            | sort_by(.total_calls) | reverse
+          )
+        }' time-metrics/agent-toolstats.jsonl 2>/dev/null
+    echo '```'
+    echo
+  fi
+
   echo "## For human review"
   echo
   echo "- Anything in patterns above appearing >=3 times across sessions deserves a rule + hook"