KeiSeiKit-1.0

keisei/KeiSeiKit-1.0

Fork 0

Commit graph

Author	SHA1	Message	Date
Parfii-bot	05db01bfd6	feat(live-graph): WebSocket activity stream — orchestrator-centric live view User pushback: "транслирует в онлайне какие агенты создаются? основное окно агента, а дальше при запусках появляются новые ветки, мы показываем в онлайне как агенты собираются и работают" Earlier `kei-graph-export` rendered the static SUBSTRATE (all 581 atoms, catalog-style). User wanted the LIFECYCLE: orchestrator at center, every new agent as a fading-in branch, every tool call as a pulse, every completion as a fade-out. TTL = until done; pure online, no history accumulation per user direction. Three-layer architecture, all conforming to schema /tmp/agent-events-schema.md: LAYER 1 — Event emitters (4 hooks) hooks/agent-event-spawn.sh PreToolUse:Agent → agent_spawn event hooks/agent-event-done.sh PostToolUse:Agent → agent_done event (parses STATUS-TRUTH MARKER for outcome, computes cost_usd from token×pricing table) hooks/tool-use-event.sh PreToolUse:Bash\|Read\|Edit\|Write\|Grep\|Glob\|NotebookEdit → tool_use event hooks/skill-record.sh EXTENDED — second emit step writes skill_use event in addition to existing kei-ledger record-skill call All 4 are POSIX /bin/sh, defensive (never block, exit 0), bypass via KEI_EVENTS_BYPASS=1. Append-only JSONL to ~/.claude/memory/agent-events.jsonl. Smoke: 4 synthetic invocations cover spawn/done/tool/filter cases. LAYER 2 — kei-graph-stream Rust daemon _primitives/_rust/kei-graph-stream/ (~480 LOC, 5 files + 1 test) - Tails events.jsonl every 200ms (poll-based, no notify dep). - Parses each event, updates AliveState (insert on spawn, remove on done). - Broadcasts {"type":"event","data":<event>} to all WebSocket clients. - On client connect: sends {"type":"snapshot","alive":[...]} first. - Heartbeat: {"type":"ping"} every 30s. - axum 0.7 + ws feature (already in Cargo.lock via kei-cortex). - Bypass: KEI_GRAPH_STREAM_BYPASS=1. Bound to 127.0.0.1:8201 (loopback only). Endpoints: GET /stream → WebSocket upgrade GET /health → "kei-graph-stream alive" 4 unit + 1 integration test. cargo build clean. Installed binary: ~/.cargo/bin/kei-graph-stream Launchd plist: io.keisei.graph-stream (RunAtLoad, KeepAlive) Loaded as PID 52678, /health 200 OK verified. LAYER 3 — live-graph.html (single-file frontend) ~/Projects/lbm-graph-viz/live-graph.html (~464 LOC, self-contained) - SVG full-viewport, dark #0f172a, CSS grid background. - Pinned center node "main" (orchestrator), gold #fbbf24, glowing. - Agents radiate via D3 force-simulation; color-by-model (sonnet=green, opus=red, haiku=blue, default=gray). - On agent_spawn: fade-in 300ms, edge from main to new node. - On tool_use: pulse on agent node (r 8→12→8 over 400ms) + floating tool name label fades 800ms. - On agent_done: outcome-color flash → fade-out 800ms → remove. - WebSocket client: ws://127.0.0.1:8201/stream, exponential-backoff reconnect (1s→30s). - Top-right status badge: ● connected \| ○ reconnecting \| ✕ disconnected. - Bottom counters: alive / spawned / tool calls / done / last event age. - No build step. D3 v7 from CDN. Pure HTML+JS+CSS. End-to-end smoke (this machine, just now): - daemon health 200 OK - hook injected agent_spawn → daemon broadcasts → AliveState=1 - hook injected agent_done → daemon broadcasts → AliveState=0 - frontend file syntax-checked clean What this does NOT do (deferred, by user direction "это онлайн"): - History persistence — agents who finished are GONE from the graph. Per-session log remains in events.jsonl + sleep-sync if user wants to consult later, but the live view is RIGHT NOW only. - Sub-agent attribution beyond "main" — orchestrator-direct tool calls show on the orchestrator node. Sub-agent's internal tool calls would need session-id correlation; current schema has agent_id="main" placeholder for non-Agent tool calls. - Replay mode — no time-scrubber. Possible follow-up if useful. - Auth on WebSocket — bound to 127.0.0.1 only. Local-only by design. === STATUS-TRUTH MARKER === shipped: functional stubs: 0 cargo-check: PASS behaviour-verified: yes follow-up-required: - Sub-agent tool-call attribution (correlate session_id chain) - Replay mode with time scrubber (if user finds use) - Tool aggregator nodes ("Bash bucket" with N) instead of per-agent pulses Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 13:30:24 +08:00
Parfii-bot	b332b571bf	feat(tracking): close 3 last observability gaps — toolStats + skill-record + numeric-claims journal Closes the loop on "without full tracking the system can't make decisions" (user pushback on partial coverage). Three gaps that left the inference layer blind are now wired: GAP #1 — agent toolStats / token counts / cache hits captured ================================================================ `agent-outcome-backfill.sh` now appends one JSONL row per spawn to `~/.claude/memory/time-metrics/agent-toolstats.jsonl` with: agent_id, outcome, stubs, ts, tool_use_count, duration_ms, tool_stats {Read:N, Bash:M, ...}, tokens_in, tokens_out, cache_read, cache_write Sidecar journal (no schema migration). Production payload's .tool_response.totalToolUseCount / totalDurationMs / toolStats / usage fields land directly. Smoke-tested with synthetic spawn — row written. GAP #2 — skill_invocations table actually receives writes ================================================================ The `skill_invocations` table (schema v8) had 0 rows because no caller existed for `skill_metrics::record_invocation`. Added two pieces: (a) `kei-ledger record-skill <name> --success {0\|1}` CLI subcommand Mirrors record-cost; same dispatch shape. Optional `--agent-id`, `--trajectory-id`, `--duration-ms`, `--db`. Validates non-empty name + duration ≥ 0. Outputs `{"ok":true,"skill":"...","ts":N}`. (b) `hooks/skill-record.sh` — PostToolUse:Skill hook. 50 LOC POSIX. Detects Skill tool calls, derives success heuristic from tool_response (exit_code / status / content non-empty), shells out to `kei-ledger record-skill`. Bypass via SKILL_RECORD_BYPASS=1. 83 kei-ledger tests pass (16 unit + 67 integration). Smoke-tested end-to-end: `kei-ledger record-skill test-skill --success 1` inserts a row with correct fields. Phase D nightly skill-metrics decisions (archive if unused N days, re-extract if success<60% over M days, validated if >20 calls + >90% success) now have data to consume. GAP #3 — numeric-claims.jsonl receives every evidence-tagged claim ================================================================ RULE 0.18 mandated three markers `[REAL:]` / `[FROM-JOURNAL:]` / `[ESTIMATE-HTC:]` on every numeric/duration/cost claim, but no hook appended valid claims to the journal — the calibration data RULE 0.18 promised never accumulated. `hooks/numeric-claims-record.sh` — Stop hook, 140 LOC POSIX. Reads transcript_path from stdin, locates the last assistant message via recursive flatten (same pattern as agent-outcome-backfill.sh after the production-payload-shape fix), regex-extracts every `<phrase> [<TIER>: <pointer>]` triple, appends one JSONL row per claim. Idempotent within 1-second window to avoid double-recording on repeat Stop fires. Bypass via NUMERIC_CLAIMS_RECORD_BYPASS=1. Smoke test: synthetic transcript with 3 markers (REAL + ESTIMATE-HTC + FROM-JOURNAL) produced exactly 3 well-formed JSONL rows. Settings.json ================================================================ - PostToolUse:Skill matcher created (or augmented if already present) with skill-record.sh. - Stop:* matcher gains numeric-claims-record.sh after the existing chain (stop-verify, task-timer, session-end-dump, extract-task- durations, chat-numeric-postflag, affect-threshold-check, enrich-from-jsonl). What this does NOT do (deferred): - Backfill `skill_invocations` from past traces (history started today; Phase D cohort builds forward from now). - Migrate the agent toolStats sidecar JSONL into a proper ledger column. Append-only file is fine for the current scale. - Refactor main.rs (now 233 LOC, was 212; pre-existing CP debt flagged by skill-record agent — separate cleanup PR). === STATUS-TRUTH MARKER === shipped: functional stubs: 0 cargo-check: PASS behaviour-verified: yes follow-up-required: - kei-ledger main.rs Constructor Pattern split (212→233 LOC) - Verify in next session: skill_invocations gets rows from real Skill tool use; numeric-claims.jsonl gets rows from real assistant messages with markers Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 03:42:09 +08:00

Author

SHA1

Message

Date

Parfii-bot

05db01bfd6

feat(live-graph): WebSocket activity stream — orchestrator-centric live view

User pushback: "транслирует в онлайне какие агенты создаются? основное
окно агента, а дальше при запусках появляются новые ветки, мы показываем
в онлайне как агенты собираются и работают"

Earlier `kei-graph-export` rendered the static SUBSTRATE (all 581 atoms,
catalog-style). User wanted the LIFECYCLE: orchestrator at center, every
new agent as a fading-in branch, every tool call as a pulse, every
completion as a fade-out. TTL = until done; pure online, no history
accumulation per user direction.

Three-layer architecture, all conforming to schema /tmp/agent-events-schema.md:

LAYER 1 — Event emitters (4 hooks)
  hooks/agent-event-spawn.sh   PreToolUse:Agent  → agent_spawn event
  hooks/agent-event-done.sh    PostToolUse:Agent → agent_done event
                               (parses STATUS-TRUTH MARKER for outcome,
                                computes cost_usd from token×pricing table)
  hooks/tool-use-event.sh      PreToolUse:Bash|Read|Edit|Write|Grep|Glob|NotebookEdit
                               → tool_use event
  hooks/skill-record.sh        EXTENDED — second emit step writes skill_use
                               event in addition to existing kei-ledger
                               record-skill call

  All 4 are POSIX /bin/sh, defensive (never block, exit 0), bypass via
  KEI_EVENTS_BYPASS=1. Append-only JSONL to
  ~/.claude/memory/agent-events.jsonl.

  Smoke: 4 synthetic invocations cover spawn/done/tool/filter cases.

LAYER 2 — kei-graph-stream Rust daemon
  _primitives/_rust/kei-graph-stream/  (~480 LOC, 5 files + 1 test)

  - Tails events.jsonl every 200ms (poll-based, no notify dep).
  - Parses each event, updates AliveState (insert on spawn, remove on done).
  - Broadcasts {"type":"event","data":<event>} to all WebSocket clients.
  - On client connect: sends {"type":"snapshot","alive":[...]} first.
  - Heartbeat: {"type":"ping"} every 30s.
  - axum 0.7 + ws feature (already in Cargo.lock via kei-cortex).
  - Bypass: KEI_GRAPH_STREAM_BYPASS=1.

  Bound to 127.0.0.1:8201 (loopback only). Endpoints:
    GET /stream  → WebSocket upgrade
    GET /health  → "kei-graph-stream alive"

  4 unit + 1 integration test. cargo build clean.

  Installed binary: ~/.cargo/bin/kei-graph-stream
  Launchd plist: io.keisei.graph-stream (RunAtLoad, KeepAlive)
  Loaded as PID 52678, /health 200 OK verified.

LAYER 3 — live-graph.html (single-file frontend)
  ~/Projects/lbm-graph-viz/live-graph.html  (~464 LOC, self-contained)

  - SVG full-viewport, dark #0f172a, CSS grid background.
  - Pinned center node "main" (orchestrator), gold #fbbf24, glowing.
  - Agents radiate via D3 force-simulation; color-by-model
    (sonnet=green, opus=red, haiku=blue, default=gray).
  - On agent_spawn: fade-in 300ms, edge from main to new node.
  - On tool_use: pulse on agent node (r 8→12→8 over 400ms) +
    floating tool name label fades 800ms.
  - On agent_done: outcome-color flash → fade-out 800ms → remove.
  - WebSocket client: ws://127.0.0.1:8201/stream, exponential-backoff
    reconnect (1s→30s).
  - Top-right status badge: ● connected | ○ reconnecting | ✕ disconnected.
  - Bottom counters: alive / spawned / tool calls / done / last event age.
  - No build step. D3 v7 from CDN. Pure HTML+JS+CSS.

End-to-end smoke (this machine, just now):
  - daemon health 200 OK
  - hook injected agent_spawn → daemon broadcasts → AliveState=1
  - hook injected agent_done  → daemon broadcasts → AliveState=0
  - frontend file syntax-checked clean

What this does NOT do (deferred, by user direction "это онлайн"):
  - History persistence — agents who finished are GONE from the graph.
    Per-session log remains in events.jsonl + sleep-sync if user wants
    to consult later, but the live view is RIGHT NOW only.
  - Sub-agent attribution beyond "main" — orchestrator-direct tool calls
    show on the orchestrator node. Sub-agent's internal tool calls would
    need session-id correlation; current schema has agent_id="main"
    placeholder for non-Agent tool calls.
  - Replay mode — no time-scrubber. Possible follow-up if useful.
  - Auth on WebSocket — bound to 127.0.0.1 only. Local-only by design.

=== STATUS-TRUTH MARKER ===
shipped: functional
stubs: 0
cargo-check: PASS
behaviour-verified: yes
follow-up-required:
  - Sub-agent tool-call attribution (correlate session_id chain)
  - Replay mode with time scrubber (if user finds use)
  - Tool aggregator nodes ("Bash bucket" with N) instead of per-agent pulses

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-02 13:30:24 +08:00

Parfii-bot

b332b571bf

feat(tracking): close 3 last observability gaps — toolStats + skill-record + numeric-claims journal

Closes the loop on "without full tracking the system can't make decisions"
(user pushback on partial coverage). Three gaps that left the inference
layer blind are now wired:

GAP #1 — agent toolStats / token counts / cache hits captured
================================================================
`agent-outcome-backfill.sh` now appends one JSONL row per spawn to
`~/.claude/memory/time-metrics/agent-toolstats.jsonl` with:
  agent_id, outcome, stubs, ts,
  tool_use_count, duration_ms, tool_stats {Read:N, Bash:M, ...},
  tokens_in, tokens_out, cache_read, cache_write
Sidecar journal (no schema migration). Production payload's
.tool_response.totalToolUseCount / totalDurationMs / toolStats / usage
fields land directly. Smoke-tested with synthetic spawn — row written.

GAP #2 — skill_invocations table actually receives writes
================================================================
The `skill_invocations` table (schema v8) had 0 rows because no caller
existed for `skill_metrics::record_invocation`. Added two pieces:

(a) `kei-ledger record-skill <name> --success {0|1}` CLI subcommand
    Mirrors record-cost; same dispatch shape. Optional `--agent-id`,
    `--trajectory-id`, `--duration-ms`, `--db`. Validates non-empty
    name + duration ≥ 0. Outputs `{"ok":true,"skill":"...","ts":N}`.

(b) `hooks/skill-record.sh` — PostToolUse:Skill hook. 50 LOC POSIX.
    Detects Skill tool calls, derives success heuristic from
    tool_response (exit_code / status / content non-empty), shells
    out to `kei-ledger record-skill`. Bypass via SKILL_RECORD_BYPASS=1.

83 kei-ledger tests pass (16 unit + 67 integration). Smoke-tested
end-to-end: `kei-ledger record-skill test-skill --success 1` inserts
a row with correct fields.

Phase D nightly skill-metrics decisions (archive if unused N days,
re-extract if success<60% over M days, validated if >20 calls + >90%
success) now have data to consume.

GAP #3 — numeric-claims.jsonl receives every evidence-tagged claim
================================================================
RULE 0.18 mandated three markers `[REAL:]` / `[FROM-JOURNAL:]` /
`[ESTIMATE-HTC:]` on every numeric/duration/cost claim, but no hook
appended valid claims to the journal — the calibration data RULE 0.18
promised never accumulated.

`hooks/numeric-claims-record.sh` — Stop hook, 140 LOC POSIX. Reads
transcript_path from stdin, locates the last assistant message via
recursive flatten (same pattern as agent-outcome-backfill.sh after
the production-payload-shape fix), regex-extracts every `<phrase>
[<TIER>: <pointer>]` triple, appends one JSONL row per claim.

Idempotent within 1-second window to avoid double-recording on
repeat Stop fires. Bypass via NUMERIC_CLAIMS_RECORD_BYPASS=1.

Smoke test: synthetic transcript with 3 markers (REAL + ESTIMATE-HTC
+ FROM-JOURNAL) produced exactly 3 well-formed JSONL rows.

Settings.json
================================================================
- PostToolUse:Skill matcher created (or augmented if already
  present) with skill-record.sh.
- Stop:* matcher gains numeric-claims-record.sh after the existing
  chain (stop-verify, task-timer, session-end-dump, extract-task-
  durations, chat-numeric-postflag, affect-threshold-check,
  enrich-from-jsonl).

What this does NOT do (deferred):
  - Backfill `skill_invocations` from past traces (history started
    today; Phase D cohort builds forward from now).
  - Migrate the agent toolStats sidecar JSONL into a proper ledger
    column. Append-only file is fine for the current scale.
  - Refactor main.rs (now 233 LOC, was 212; pre-existing CP debt
    flagged by skill-record agent — separate cleanup PR).

=== STATUS-TRUTH MARKER ===
shipped: functional
stubs: 0
cargo-check: PASS
behaviour-verified: yes
follow-up-required:
  - kei-ledger main.rs Constructor Pattern split (212→233 LOC)
  - Verify in next session: skill_invocations gets rows from real
    Skill tool use; numeric-claims.jsonl gets rows from real assistant
    messages with markers

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-02 03:42:09 +08:00

2 commits