feat(sleep-sync): mirror time-metrics + ledger snapshots, surface in Phase B report
User pushback: "что теперь делает сон? все связано?" — Sleep Phase B
was reading only `traces/`, ignoring the four tracking journals shipped
in the previous commit. Cloud agent had a partial view of what happened.
This commit closes the loop. Sleep now sees everything that's tracked.
PUSH SIDE — `kei-sleep-sync.sh` (called on every Stop event)
Now mirrors the full observability surface into the memory-repo:
~/.claude/memory/time-metrics/sessions.jsonl → time-metrics/
~/.claude/memory/time-metrics/tasks.jsonl → time-metrics/
~/.claude/memory/time-metrics/numeric-claims.jsonl → time-metrics/
~/.claude/memory/time-metrics/agent-toolstats.jsonl→ time-metrics/
~/.claude/agents/ledger.sqlite agents table → ledger/agents.jsonl
~/.claude/agents/ledger.sqlite skill_invocations → ledger/skill_invocations.jsonl
Format: JSONL (one row per object). The two ledger tables are dumped
via `sqlite3 + json_object()` so cloud agents can stream-parse into
pandas / duckdb without binary-file handling.
First sync moved 6 files / 638 rows from local to remote — verified
by `git show --stat` of the resulting `memory: session traces` commit.
CONSUME SIDE — `phase-b-rem.sh` REM-consolidation report
Each nightly `reports/sleep-YYYY-MM-DD.md` now ends with a "Tracking
observability (last 7 days)" section containing four jq-aggregated
digests:
1. Agent outcomes — per-model: n, functional/partial/scaffolding/fail
counts + total_cost_usd. Lets the agent see whether the model-tier
refactor (cb1fdde) actually paid off and whether Sonnet success
rate justifies routing more task classes to it.
2. Skill success rates — per-skill: n, successes, rate_pct. Drives
Phase D nightly decisions (archive unused / re-extract failing /
mark validated). Empty until Skill tool is invoked in the next
session.
3. Numeric-claims tier breakdown — REAL / FROM-JOURNAL / ESTIMATE-HTC
counts. High ESTIMATE-HTC ratio = orchestrator under-calibrated.
Cloud agent's job: spot frequent ESTIMATE-HTC categories and
propose conversion to FROM-JOURNAL via measured runs.
4. Agent tool-call patterns — mean tool_use_count, mean duration_ms,
per-tool total calls. Lets the agent see "this code-implementer
spawn made 30 Read but 1 Edit — was tier-allocation correct?".
All four sections gracefully skip if the source JSONL is missing or
empty. jq is the only new dependency (already present per existing
phase-b checks).
What is NOT yet automated:
- The cloud agent's prompt template doesn't yet INSTRUCT it to act
on these digests. Currently the digest is data; whether the agent
proposes rule + hook codification based on it depends on the
free-text instructions in the schedule. Follow-up: codify a Phase B
instruction block that maps each digest to a recommendation pattern.
- Idempotency on `cp` for time-metrics: I use plain `cp` (not `cp -n`)
so the latest local state always overwrites remote. The journals are
append-only on the local side, so this is safe — but if two machines
ever share one memory-repo it would corrupt. Out of scope for
single-machine setup.
=== STATUS-TRUTH MARKER ===
shipped: functional
stubs: 0
cargo-check: NOT-RUN (pure shell)
behaviour-verified: yes
follow-up-required:
- Phase B prompt template — instruct cloud agent to act on the four
digests (codify recurring patterns, calibrate ESTIMATE-HTC).
- skill_invocations.jsonl will populate from next session onward.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
e073df6c98
commit
9a3db14b90
2 changed files with 132 additions and 1 deletions
|
|
@ -51,7 +51,55 @@ if [ -d "$TRACES_SRC" ]; then
|
||||||
cp -n "$TRACES_SRC"/*.jsonl traces/ 2>/dev/null || true
|
cp -n "$TRACES_SRC"/*.jsonl traces/ 2>/dev/null || true
|
||||||
fi
|
fi
|
||||||
|
|
||||||
git add traces/ backlog.md 2>/dev/null || { log_err "git add failed"; exit 0; }
|
# Mirror time-metrics journals (RULE 0.18 + post-2026-05-02 tracking).
|
||||||
|
# Append-only JSONL, OK to overwrite remote with local since local is the
|
||||||
|
# source-of-truth for this user's machine. Source files:
|
||||||
|
# sessions.jsonl — RULE 0.18 session-duration journal
|
||||||
|
# tasks.jsonl — task-timer.sh per-Agent durations
|
||||||
|
# numeric-claims.jsonl — RULE 0.18 evidence-tagged claims
|
||||||
|
# agent-toolstats.jsonl — agent-outcome-backfill.sh sidecar
|
||||||
|
TIME_METRICS_SRC="${HOME}/.claude/memory/time-metrics"
|
||||||
|
if [ -d "$TIME_METRICS_SRC" ]; then
|
||||||
|
mkdir -p time-metrics 2>/dev/null || true
|
||||||
|
for f in sessions.jsonl tasks.jsonl numeric-claims.jsonl agent-toolstats.jsonl; do
|
||||||
|
if [ -f "$TIME_METRICS_SRC/$f" ]; then
|
||||||
|
cp "$TIME_METRICS_SRC/$f" "time-metrics/$f" 2>/dev/null || true
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Snapshot kei-ledger: agents + skill_invocations as JSONL (sqlite3 .dump
|
||||||
|
# has too much noise + is binary-ordering-sensitive). Cloud agents can
|
||||||
|
# stream-parse JSONL straight into pandas/duckdb for analysis.
|
||||||
|
LEDGER_DB="${KEI_LEDGER_DB:-${HOME}/.claude/agents/ledger.sqlite}"
|
||||||
|
if [ -f "$LEDGER_DB" ] && command -v sqlite3 >/dev/null 2>&1; then
|
||||||
|
mkdir -p ledger 2>/dev/null || true
|
||||||
|
# `-newline` mode + `-cmd .mode json` would be cleaner but isn't
|
||||||
|
# universally available; emit one-row-per-line JSON via select+json_object.
|
||||||
|
sqlite3 "$LEDGER_DB" \
|
||||||
|
"SELECT json_object(
|
||||||
|
'id', id, 'branch', branch, 'parent_branch', parent_branch,
|
||||||
|
'spec_sha', spec_sha, 'status', status,
|
||||||
|
'started_ts', started_ts, 'finished_ts', finished_ts,
|
||||||
|
'summary', summary, 'worktree_path', worktree_path,
|
||||||
|
'dna', dna, 'creator_id', creator_id, 'fork_parent_id', fork_parent_id,
|
||||||
|
'cost_micro_cents', cost_micro_cents, 'provider', provider,
|
||||||
|
'model', model, 'tokens_in', tokens_in, 'tokens_out', tokens_out,
|
||||||
|
'stubs_count', stubs_count, 'outcome', outcome,
|
||||||
|
'escalation_depth', escalation_depth, 'task_class_dna', task_class_dna
|
||||||
|
) FROM agents ORDER BY started_ts" \
|
||||||
|
> ledger/agents.jsonl 2>/dev/null || true
|
||||||
|
sqlite3 "$LEDGER_DB" \
|
||||||
|
"SELECT json_object(
|
||||||
|
'id', id, 'skill_name', skill_name, 'ts', ts,
|
||||||
|
'agent_id', agent_id, 'success', success,
|
||||||
|
'trajectory_id', trajectory_id, 'duration_ms', duration_ms
|
||||||
|
) FROM skill_invocations ORDER BY ts" \
|
||||||
|
> ledger/skill_invocations.jsonl 2>/dev/null || true
|
||||||
|
fi
|
||||||
|
|
||||||
|
git add traces/ backlog.md time-metrics/ ledger/ 2>/dev/null \
|
||||||
|
|| { log_err "git add failed"; exit 0; }
|
||||||
|
|
||||||
# Nothing staged — silent exit.
|
# Nothing staged — silent exit.
|
||||||
if git diff --cached --quiet 2>/dev/null; then
|
if git diff --cached --quiet 2>/dev/null; then
|
||||||
|
|
|
||||||
|
|
@ -118,6 +118,89 @@ REPORT="reports/sleep-$TODAY.md"
|
||||||
head -100 "$PATTERNS_OUT"
|
head -100 "$PATTERNS_OUT"
|
||||||
echo '```'
|
echo '```'
|
||||||
echo
|
echo
|
||||||
|
# ----- Per-axis observability digests (added 2026-05-02) -----
|
||||||
|
# Cloud agents and morning human review get an actionable rollup of the
|
||||||
|
# tracking journals without having to parse multi-thousand-line JSONL.
|
||||||
|
if [ -d "ledger" ] || [ -d "time-metrics" ]; then
|
||||||
|
echo "## Tracking observability (last 7 days)"
|
||||||
|
echo
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -f "ledger/agents.jsonl" ] && [ -s "ledger/agents.jsonl" ] \
|
||||||
|
&& command -v jq >/dev/null 2>&1; then
|
||||||
|
echo "### Agent outcomes — ledger/agents.jsonl"
|
||||||
|
echo '```'
|
||||||
|
SEVEN_DAYS_AGO=$(( $(date +%s) - 7*86400 ))
|
||||||
|
jq -s --argjson cutoff "$SEVEN_DAYS_AGO" '
|
||||||
|
[.[] | select(.started_ts >= $cutoff)]
|
||||||
|
| group_by(.model) | map({
|
||||||
|
model: .[0].model,
|
||||||
|
n: length,
|
||||||
|
functional: ([.[] | select(.outcome=="functional")] | length),
|
||||||
|
partial: ([.[] | select(.outcome=="partial")] | length),
|
||||||
|
scaffolding: ([.[] | select(.outcome=="scaffolding")] | length),
|
||||||
|
fail: ([.[] | select(.outcome=="fail")] | length),
|
||||||
|
unknown: ([.[] | select(.outcome==null or .outcome=="")] | length),
|
||||||
|
total_cost_usd: (([.[] | .cost_micro_cents // 0] | add) / 100000000)
|
||||||
|
})' ledger/agents.jsonl 2>/dev/null | head -100
|
||||||
|
echo '```'
|
||||||
|
echo
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -f "ledger/skill_invocations.jsonl" ] && [ -s "ledger/skill_invocations.jsonl" ] \
|
||||||
|
&& command -v jq >/dev/null 2>&1; then
|
||||||
|
echo "### Skill success rates — ledger/skill_invocations.jsonl"
|
||||||
|
echo '```'
|
||||||
|
SEVEN_DAYS_AGO=$(( $(date +%s) - 7*86400 ))
|
||||||
|
jq -s --argjson cutoff "$SEVEN_DAYS_AGO" '
|
||||||
|
[.[] | select(.ts >= $cutoff)]
|
||||||
|
| group_by(.skill_name) | map({
|
||||||
|
skill: .[0].skill_name,
|
||||||
|
n: length,
|
||||||
|
successes: ([.[] | select(.success==1)] | length),
|
||||||
|
rate_pct: ((([.[] | select(.success==1)] | length) * 100) / length)
|
||||||
|
}) | sort_by(.n) | reverse' ledger/skill_invocations.jsonl 2>/dev/null | head -50
|
||||||
|
echo '```'
|
||||||
|
echo
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -f "time-metrics/numeric-claims.jsonl" ] \
|
||||||
|
&& [ -s "time-metrics/numeric-claims.jsonl" ] \
|
||||||
|
&& command -v jq >/dev/null 2>&1; then
|
||||||
|
echo "### Numeric-claims tier breakdown — time-metrics/numeric-claims.jsonl"
|
||||||
|
echo '```'
|
||||||
|
jq -s 'group_by(.evidence_tier) | map({tier: .[0].evidence_tier, n: length})' \
|
||||||
|
time-metrics/numeric-claims.jsonl 2>/dev/null
|
||||||
|
echo '```'
|
||||||
|
echo "_RULE 0.18 health: high ESTIMATE-HTC ratio = orchestrator under-calibrated. Cloud agent should propose converting frequent ESTIMATE-HTC categories into FROM-JOURNAL via measured runs._"
|
||||||
|
echo
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -f "time-metrics/agent-toolstats.jsonl" ] \
|
||||||
|
&& [ -s "time-metrics/agent-toolstats.jsonl" ] \
|
||||||
|
&& command -v jq >/dev/null 2>&1; then
|
||||||
|
echo "### Agent tool-call patterns — time-metrics/agent-toolstats.jsonl"
|
||||||
|
echo '```'
|
||||||
|
jq -s '
|
||||||
|
[.[] | select(.tool_stats != null)] as $rows
|
||||||
|
| ($rows | length) as $n
|
||||||
|
| {
|
||||||
|
n_with_stats: $n,
|
||||||
|
mean_tool_uses: (if $n == 0 then 0
|
||||||
|
else (($rows | map(.tool_use_count // 0) | add) / $n) end),
|
||||||
|
mean_duration_ms: (if $n == 0 then 0
|
||||||
|
else (($rows | map(.duration_ms // 0) | add) / $n) end),
|
||||||
|
tool_distribution: (
|
||||||
|
[$rows[] | .tool_stats // {} | to_entries[]]
|
||||||
|
| group_by(.key)
|
||||||
|
| map({tool: .[0].key, total_calls: ([.[] | .value] | add)})
|
||||||
|
| sort_by(.total_calls) | reverse
|
||||||
|
)
|
||||||
|
}' time-metrics/agent-toolstats.jsonl 2>/dev/null
|
||||||
|
echo '```'
|
||||||
|
echo
|
||||||
|
fi
|
||||||
|
|
||||||
echo "## For human review"
|
echo "## For human review"
|
||||||
echo
|
echo
|
||||||
echo "- Anything in patterns above appearing >=3 times across sessions deserves a rule + hook"
|
echo "- Anything in patterns above appearing >=3 times across sessions deserves a rule + hook"
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue