KeiSeiKit-1.0/docs/MEMORY-FORMAT.md
Parfii-bot d2068cded7 docs: reviewer-response — honesty pass + portable format specs
External reviewer raised 7 overclaim/scope concerns. Agents verified each
against source; this commit applies all fixes that landed in docs.

Honesty pass:
- README:25-29 — Cortex daemon track listed as alpha (was beta); MCP server
  marked "alpha (unpublished) — install via local dist build"; Phase B
  noted "auto-codification not yet wired (manual via /escalate-recurrence)";
  keigit framed as author-operated mirror (KeiSei84 / private Forgejo),
  not neutral community service
- README:95-97 — Cortex CLI/daemon track downgraded beta→alpha
  with rationale (browser-app + VSCode-extension are concept-level)
- docs/ARCHITECTURE.md — added "Model router — current state (2026-05-03)"
  subsection: per-call fixed estimate routing, NO 100-row Bayesian threshold
  in current source (select.rs:74-124); reviewer suggestion deferred
- docs/SLEEP-LAYER.md — added Phase B scope clarification: morning report
  is read-only markdown, no auto-codification path
- docs/PUBLISHING.md — aligned framing with README:43 ("author-operated
  mirror" not "community registry"); added vendor-neutrality note that
  substrate works against any npm-compatible registry
- mcp-server/package.json — added "private": true and description note
  to prevent accidental publish before maturity gate

Portable format specs (reviewer asked for memory-repo agnosticism):
- docs/MEMORY-FORMAT.md (196 LOC) — JSONL schemas for traces / decisions /
  agent-events with jq/awk/pandas recipes, grounded in actual writers
- docs/DNA-FORMAT.md (159 LOC) — DNA wire format ("type::caps::sha8")
  with shell+python parsers
- docs/LEDGER-SCHEMA.md (199 LOC) — full SQLite DDL (agents +
  skill_invocations + indexes + triggers) with sample queries

Auto-regen artifact:
- docs/DNA-INDEX.md — kei-registry regenerated count 564→565

Verification:
- All claims traced to file:line in source by agent a52b29ae
- All new docs ≤200 LOC per Constructor Pattern
- Reality verification verdicts: README/MCP/Phase-B/Cortex VERIFIED;
  Bayesian-router PARTIAL (overclaim removed); keigit PARTIAL (framing
  fixed in this commit); memory-format VERIFIED-FALSE (spec added)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 16:59:25 +08:00

6.1 KiB

MEMORY-FORMAT — Portable Specification

How to read ~/.claude/memory/ WITHOUT the kei-memory binary. All formats are newline-delimited JSON (JSONL) or SQLite. Derived from source: _primitives/_rust/kei-memory/src/ (2026-05-02).


Section 1 — Event JSONL (Claude Code trace format)

Each session produces one .jsonl file in ~/.claude/memory/traces/. Two wire formats coexist; kei-memory handles both.

1a. Real Claude Code trace (primary format, 2026-04-30+)

{
  "type":       "assistant" | "user" | "system" | "result",
  "timestamp":  "2026-04-30T18:27:10Z",   // RFC-3339, UTC
  "sessionId":  "bf053cbd-...",            // UUID4, groups all lines in session
  "cwd":        "/Users/x/Projects/Foo",  // working directory at event time
  "gitBranch":  "main",
  "uuid":       "u1",                      // event UUID
  "parentUuid": "u0",                      // preceding event UUID (chain)
  "subtype":    "tool_use" | "tool_result" | null,
  "message": {
    "role":    "assistant" | "user",
    "content": [ /* ContentBlock array — see §1c */ ]
  },
  "toolUseID":     "toolu_...",   // present on tool_result lines
  "toolUseResult": { ... }        // present on tool_result lines
}

1b. Legacy KeiSeiKit flat format (back-compat)

{
  "ts":          1700000000,     // Unix epoch seconds (integer)
  "kind":        "tool_use" | "tool_result" | "other",
  "tool":        "Bash" | "Read" | "Edit" | "Write" | ...,
  "file_path":   "/abs/path/file.rs",
  "is_error":    false,
  "event_class": "tool_use:Read",  // pre-classified string
  "message":     "stdout text"
}

1c. ContentBlock inside message.content

// tool_use block (in assistant messages)
{ "type": "tool_use", "id": "t1", "name": "Read", "input": {"file_path": "/a"} }

// tool_result block (in user messages)
{ "type": "tool_result", "tool_use_id": "t1", "content": "...", "is_error": false }

1d. event_class labels (assigned by classifier)

tool_use:<Name> — assistant tool call | tool_result — user result | tool_error[:<Name>] — is_error=true | permission_denied — message matches /permission\s+denied/i | user_correctionagain, опять, stop doing | worktree_errorworktree.*(error|denied|fail) | cargo_workspacecargo.*workspace | retry_loopretry|retrying|attempt \d+ | <kind> — type field fallback | other — default


Section 2 — Time-metrics journals (RULE 0.18)

Path: ~/.claude/memory/time-metrics/{sessions,tasks,numeric-claims}.jsonl

sessions.jsonl

{
  "kind":        "session",
  "id":          "bf053cbd-a6f8-47a6-a80f-11b829d63980",  // Claude Code sessionId
  "start_epoch": 1777473449,   // Unix epoch seconds
  "end_epoch":   1777473560,
  "duration_s":  111,
  "ts":          "2026-04-29T14:39:20Z"  // RFC-3339 wall-clock
}

tasks.jsonl

{
  "kind":        "task" | "start" | "stop",
  "name":        "wave10-agent-decomposition",
  "start_epoch": 1777438080,
  "end_epoch":   1777438665,
  "duration_s":  585,
  "exit":        0,             // present on "stop" records
  "metric": {                   // optional, task-specific counters
    "new_atomars": 25,
    "new_rules":   1
  },
  "source": "~/.claude/agents/_manifests/",
  "ts":     "2026-04-29T04:57:45Z"
}

numeric-claims.jsonl

{
  "kind":          "claim",
  "value":         "wave 5 took 18 min",
  "evidence_tier": "REAL" | "FROM-JOURNAL" | "ESTIMATE-HTC",
  "pointer":       "tasks.jsonl#wave5-2026-04-29",
  "ts":            "2026-04-29T05:00:00Z",
  "session_id":    "bf053cbd-..."  // optional
}

Section 3 — Reading with jq

Q1: all tool_use:Bash events since 2026-04-25

cat ~/.claude/memory/traces/*.jsonl \
  | jq -c 'select(.event_class == "tool_use:Bash")
           | select((.timestamp // "") >= "2026-04-25")'

Q2: median event-count per sessionId

cat ~/.claude/memory/traces/*.jsonl \
  | jq -s 'group_by(.sessionId // .session_id)
           | map({s: .[0].sessionId, n: length}) | sort_by(.n) | .[length/2|floor]'

Q3: errors-per-session timeline

cat ~/.claude/memory/traces/*.jsonl \
  | jq -c 'select(.is_error == true or (.event_class // "" | startswith("tool_error")))
           | {session: (.sessionId // .session_id), ts: (.timestamp // .ts)}' \
  | sort | uniq -c

Q4: median duration_s for tasks containing "audit"

jq -s '[.[] | select(.kind=="task" and (.name|test("audit"))) | .duration_s]
       | sort | .[length/2|floor]' ~/.claude/memory/time-metrics/tasks.jsonl

Q5: token cost outliers (v9 ledger)

sqlite3 ~/.claude/agents/ledger.sqlite \
  "SELECT id, COALESCE(tokens_in,0)+COALESCE(tokens_out,0) AS tok
   FROM agents WHERE tok > 100000 ORDER BY tok DESC LIMIT 20;"

Section 4 — Reading with pandas

import pandas as pd, json, pathlib, glob

def load_traces(pattern="~/.claude/memory/traces/*.jsonl"):
    rows = [json.loads(l) for f in glob.glob(str(pathlib.Path(pattern).expanduser()))
            for l in open(f) if l.strip()]
    return pd.DataFrame(rows)

# Recipe 1: event class frequency
print(load_traces()["event_class"].value_counts().head(20))

# Recipe 2: sessions.jsonl duration histogram
sess = pd.read_json("~/.claude/memory/time-metrics/sessions.jsonl", lines=True)
sess[sess.kind == "session"]["duration_s"].hist(bins=30)

# Recipe 3: tasks median duration by name prefix
tasks = pd.read_json("~/.claude/memory/time-metrics/tasks.jsonl", lines=True)
tasks[tasks.kind == "task"].groupby(
    tasks["name"].str.split("-").str[0])["duration_s"].median()

Section 5 — Reading with awk (streaming)

Recipe 1: count events per event_class (no jq required)

cat ~/.claude/memory/traces/*.jsonl \
  | awk -F'"event_class":"' 'NF>1 {split($2,a,"\""); counts[a[1]]++}
    END {for (k in counts) print counts[k], k}' \
  | sort -rn | head -20

Recipe 2: extract all Bash commands from traces

cat ~/.claude/memory/traces/*.jsonl \
  | awk -F'"command":"' 'NF>1 {
      split($2, a, "\"")
      gsub(/\\n/, "\n", a[1])
      print substr(a[1], 1, 120)
    }'