Commit graph

5 commits

Author SHA1 Message Date
Parfii-bot
9a3db14b90 feat(sleep-sync): mirror time-metrics + ledger snapshots, surface in Phase B report
User pushback: "что теперь делает сон? все связано?" — Sleep Phase B
was reading only `traces/`, ignoring the four tracking journals shipped
in the previous commit. Cloud agent had a partial view of what happened.

This commit closes the loop. Sleep now sees everything that's tracked.

PUSH SIDE — `kei-sleep-sync.sh` (called on every Stop event)

Now mirrors the full observability surface into the memory-repo:

  ~/.claude/memory/time-metrics/sessions.jsonl       → time-metrics/
  ~/.claude/memory/time-metrics/tasks.jsonl          → time-metrics/
  ~/.claude/memory/time-metrics/numeric-claims.jsonl → time-metrics/
  ~/.claude/memory/time-metrics/agent-toolstats.jsonl→ time-metrics/
  ~/.claude/agents/ledger.sqlite agents table        → ledger/agents.jsonl
  ~/.claude/agents/ledger.sqlite skill_invocations   → ledger/skill_invocations.jsonl

Format: JSONL (one row per object). The two ledger tables are dumped
via `sqlite3 + json_object()` so cloud agents can stream-parse into
pandas / duckdb without binary-file handling.

First sync moved 6 files / 638 rows from local to remote — verified
by `git show --stat` of the resulting `memory: session traces` commit.

CONSUME SIDE — `phase-b-rem.sh` REM-consolidation report

Each nightly `reports/sleep-YYYY-MM-DD.md` now ends with a "Tracking
observability (last 7 days)" section containing four jq-aggregated
digests:

  1. Agent outcomes — per-model: n, functional/partial/scaffolding/fail
     counts + total_cost_usd. Lets the agent see whether the model-tier
     refactor (cb1fdde) actually paid off and whether Sonnet success
     rate justifies routing more task classes to it.

  2. Skill success rates — per-skill: n, successes, rate_pct. Drives
     Phase D nightly decisions (archive unused / re-extract failing /
     mark validated). Empty until Skill tool is invoked in the next
     session.

  3. Numeric-claims tier breakdown — REAL / FROM-JOURNAL / ESTIMATE-HTC
     counts. High ESTIMATE-HTC ratio = orchestrator under-calibrated.
     Cloud agent's job: spot frequent ESTIMATE-HTC categories and
     propose conversion to FROM-JOURNAL via measured runs.

  4. Agent tool-call patterns — mean tool_use_count, mean duration_ms,
     per-tool total calls. Lets the agent see "this code-implementer
     spawn made 30 Read but 1 Edit — was tier-allocation correct?".

All four sections gracefully skip if the source JSONL is missing or
empty. jq is the only new dependency (already present per existing
phase-b checks).

What is NOT yet automated:

  - The cloud agent's prompt template doesn't yet INSTRUCT it to act
    on these digests. Currently the digest is data; whether the agent
    proposes rule + hook codification based on it depends on the
    free-text instructions in the schedule. Follow-up: codify a Phase B
    instruction block that maps each digest to a recommendation pattern.

  - Idempotency on `cp` for time-metrics: I use plain `cp` (not `cp -n`)
    so the latest local state always overwrites remote. The journals are
    append-only on the local side, so this is safe — but if two machines
    ever share one memory-repo it would corrupt. Out of scope for
    single-machine setup.

=== STATUS-TRUTH MARKER ===
shipped: functional
stubs: 0
cargo-check: NOT-RUN  (pure shell)
behaviour-verified: yes
follow-up-required:
  - Phase B prompt template — instruct cloud agent to act on the four
    digests (codify recurring patterns, calibrate ESTIMATE-HTC).
  - skill_invocations.jsonl will populate from next session onward.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 04:02:28 +08:00
Parfii-bot
e073df6c98 feat(tracking): close 3 last observability gaps — toolStats + skill-record + numeric-claims journal
Closes the loop on "without full tracking the system can't make decisions"
(user pushback on partial coverage). Three gaps that left the inference
layer blind are now wired:

GAP #1 — agent toolStats / token counts / cache hits captured
================================================================
`agent-outcome-backfill.sh` now appends one JSONL row per spawn to
`~/.claude/memory/time-metrics/agent-toolstats.jsonl` with:
  agent_id, outcome, stubs, ts,
  tool_use_count, duration_ms, tool_stats {Read:N, Bash:M, ...},
  tokens_in, tokens_out, cache_read, cache_write
Sidecar journal (no schema migration). Production payload's
.tool_response.totalToolUseCount / totalDurationMs / toolStats / usage
fields land directly. Smoke-tested with synthetic spawn — row written.

GAP #2 — skill_invocations table actually receives writes
================================================================
The `skill_invocations` table (schema v8) had 0 rows because no caller
existed for `skill_metrics::record_invocation`. Added two pieces:

(a) `kei-ledger record-skill <name> --success {0|1}` CLI subcommand
    Mirrors record-cost; same dispatch shape. Optional `--agent-id`,
    `--trajectory-id`, `--duration-ms`, `--db`. Validates non-empty
    name + duration ≥ 0. Outputs `{"ok":true,"skill":"...","ts":N}`.

(b) `hooks/skill-record.sh` — PostToolUse:Skill hook. 50 LOC POSIX.
    Detects Skill tool calls, derives success heuristic from
    tool_response (exit_code / status / content non-empty), shells
    out to `kei-ledger record-skill`. Bypass via SKILL_RECORD_BYPASS=1.

83 kei-ledger tests pass (16 unit + 67 integration). Smoke-tested
end-to-end: `kei-ledger record-skill test-skill --success 1` inserts
a row with correct fields.

Phase D nightly skill-metrics decisions (archive if unused N days,
re-extract if success<60% over M days, validated if >20 calls + >90%
success) now have data to consume.

GAP #3 — numeric-claims.jsonl receives every evidence-tagged claim
================================================================
RULE 0.18 mandated three markers `[REAL:]` / `[FROM-JOURNAL:]` /
`[ESTIMATE-HTC:]` on every numeric/duration/cost claim, but no hook
appended valid claims to the journal — the calibration data RULE 0.18
promised never accumulated.

`hooks/numeric-claims-record.sh` — Stop hook, 140 LOC POSIX. Reads
transcript_path from stdin, locates the last assistant message via
recursive flatten (same pattern as agent-outcome-backfill.sh after
the production-payload-shape fix), regex-extracts every `<phrase>
[<TIER>: <pointer>]` triple, appends one JSONL row per claim.

Idempotent within 1-second window to avoid double-recording on
repeat Stop fires. Bypass via NUMERIC_CLAIMS_RECORD_BYPASS=1.

Smoke test: synthetic transcript with 3 markers (REAL + ESTIMATE-HTC
+ FROM-JOURNAL) produced exactly 3 well-formed JSONL rows.

Settings.json
================================================================
- PostToolUse:Skill matcher created (or augmented if already
  present) with skill-record.sh.
- Stop:* matcher gains numeric-claims-record.sh after the existing
  chain (stop-verify, task-timer, session-end-dump, extract-task-
  durations, chat-numeric-postflag, affect-threshold-check,
  enrich-from-jsonl).

What this does NOT do (deferred):
  - Backfill `skill_invocations` from past traces (history started
    today; Phase D cohort builds forward from now).
  - Migrate the agent toolStats sidecar JSONL into a proper ledger
    column. Append-only file is fine for the current scale.
  - Refactor main.rs (now 233 LOC, was 212; pre-existing CP debt
    flagged by skill-record agent — separate cleanup PR).

=== STATUS-TRUTH MARKER ===
shipped: functional
stubs: 0
cargo-check: PASS
behaviour-verified: yes
follow-up-required:
  - kei-ledger main.rs Constructor Pattern split (212→233 LOC)
  - Verify in next session: skill_invocations gets rows from real
    Skill tool use; numeric-claims.jsonl gets rows from real assistant
    messages with markers

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 03:42:09 +08:00
Parfii-bot
033b9efbad fix(outcome-hook): production payload uses object.content[*].text shape
Hook never fired in production despite passing unit tests. Diagnosed
via debug-log + payload dump: real Claude Code PostToolUse:Agent sends
`tool_response` as an OBJECT (not string, not array), with the agent's
reply at `tool_response.content[0].text` — keys: agentId / agentType /
content / prompt / status / toolStats / totalDurationMs / totalTokens
/ totalToolUseCount / usage.

Original jq filter handled string + object (`$r.content // $r.text`)
but `$r.content` returns the array verbatim; `jq -r` then dumps the
JSON literal which has `\n` as escape sequences, defeating the
`grep -m1 '^shipped:'` line-anchor.

Fix: recursive `flatten` jq function:
  string                     → as-is
  array of any               → recurse, join "\n"
  object with .text          → return .text
  object with .content       → recurse into content
  anything else              → ""

Verified end-to-end: latest 4 code-implementer spawns now write
outcome=functional to ledger correctly. Beta posterior in
kei-model-router begins receiving signal.

Production cleanup:
- Removed verbose debug-log + payload-dump diagnostic. Toggle via
  `AGENT_OUTCOME_DEBUG=1` env if hook stops firing in some future
  Claude Code version.
- Hook source committed to `hooks/agent-outcome-backfill.sh` so
  `install.sh` deploys it on fresh installs (was only in user-home
  previously — gap from `feat/substrate-path-atoms` agent run).

=== STATUS-TRUTH MARKER ===
shipped: functional
stubs: 0
cargo-check: NOT-RUN
behaviour-verified: yes
follow-up-required:
  - none

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 01:09:15 +08:00
Parfii-bot
f3f5f79760 feat(frontend-loop): kei-db-contract primitive + frontend-validator agent + auto-dev-guard hook
Frontend continuous-quality loop landed. Three composable cubes:

Wave 1 — kei-db-contract primitive (~870 LOC, 7 cubes per Constructor Pattern):
- Diffs SQL CREATE TABLE migrations against TypeScript type/interface declarations
- 4 drift modes: ORPHAN-SQL, ORPHAN-TS, TYPE-MISMATCH, NULL-MISMATCH
- Reuses sqlparser-rs (Apache 2.0) + regex + walkdir + serde_json + clap
- CLI: kei-db-contract <project-root> [--output json|text] [--strict]
- 5/5 integration tests pass (cargo check + cargo test green)
- Smoke-tested on keisei-marketplace: drift_count=266 across 30 tables
  (expected — marketplace uses raw better-sqlite3 without explicit row types)

Wave 2 — frontend-validator agent + dev-guard skill extension:
- New _manifests/frontend-validator.toml (substrate_role: edit-local, tools: Bash+Read+Glob+Grep)
- Agent runs: stack detect → tsc --noEmit → eslint → kei-db-contract → playwright (optional)
- Severity rules: TYPE_CHECK FAIL = block, DB_CONTRACT drift > 0 = block, lint = advisory
- skills/dev-guard/SKILL.md extended: 4th agent triggered on .tsx/.ts/.dart edits or DB-layer touches
- adaptive-depth table extended with frontend + DB-layer rows

Wave 3 — auto-dev-guard.sh hook (PostToolUse:Edit|Write):
- Trivial-edit gate: skip if delta < 30 LOC (avoid spawn fatigue)
- File-pattern match: *.tsx|*.ts|*.svelte|*.vue|*.dart OR migrations/*.sql OR src/db/** OR src/types/** OR prisma/schema.prisma OR drizzle.config.*
- Auto-runs kei-db-contract for DB-layer edits if binary on PATH
- Stderr advisory only (exit 0 always — never blocks)
- Bypass: KEI_DISABLED_HOOKS or KEI_HOOK_PROFILE in {advisory-off, minimal, off}
- Smoke-tested with synthetic Edit input (39 LOC delta on .tsx → emits advisory)
- Registered in hooks/hooks.json under PostToolUse:Write|Edit chain

Reusability map (Constructor Pattern compose):
  shared cubes: detect-stack, tsc, eslint, kei-db-contract, kei-visual-snapshot (deferred)
  orchestrators: /dev-start (pre), /dev-guard (during, NOW with frontend-validator),
                 /dev-ship (final), /site-create (init)

Verify-before-commit (RULE 0.13):
- cargo check -p kei-db-contract: PASS
- cargo test -p kei-db-contract: 5 passed
- jq . hooks/hooks.json: valid
- bash hooks/auto-dev-guard.sh < synthetic-input: works (frontend-relevant edit detected, exit 0)

=== STATUS-TRUTH MARKER ===
shipped: functional
stubs: 0
cargo-check: PASS
cargo-test: PASS (5 tests, 0 failures)
behaviour-verified: yes
follow-up-required:
  - kei-visual-snapshot primitive (Playwright wrap) — Wave 4, deferred
  - /dev-start frontend-contract-designer agent + /dev-ship frontend-final-gate — Wave 5, after Wave 1-3 obkatka
  - install.sh wiring for kei-db-contract binary
  - hermes-style emit-on-drift advisory mode

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:34:39 +08:00
Parfii-bot
0be354a920 KeiSeiKit-public — clean state
Single-commit clean baseline after security scrub of niche-tells,
project codenames, internal jargon, and contributor-email leaks.

Contents:
- 100 Rust crates (_primitives/_rust/)
- 37 agent manifests (_manifests/) + generated specs (_generated/)
- 67 user-invocable skills (skills/)
- 33 hooks (hooks/)
- Composition blocks (_blocks/)
- Documentation (docs/, README.md)
- TS adapter packages (_ts_packages/)
- Assembler (_assembler/)
- Roles (_roles/)
- Templates (_templates/)
- Forgejo CI (.forgejo/)

Author: Denis Parfionovich <info@greendragon.info>

License: see LICENSE.
2026-05-01 12:09:03 +08:00