KeiSeiKit-1.0/docs/SUBSTRATE-SCHEMA.md
Parfii-bot 58944e15bd docs(substrate): v1 atom/capability/graph SSoT schema — DRAFT for review
Substrate thesis requires a single source of truth before parallel work
streams (UI/Atoms/Graph/Runtime) can proceed independently without drift.
This document is that SSoT.

Key decisions baked in (open to revision before lock):

- Atom = one verb on a primitive, not one crate. Target ~150 atoms
  across current 25 crates. Crate = physical container, atom = unit of
  composition.

- File layout: src/atoms/<verb>.rs (code) + atoms/<verb>.md (docs with
  machine-parseable YAML frontmatter) + atoms/schemas/*.json (JSON
  Schema draft-07 for input/output) + capabilities.toml (auto-generated
  aggregator, committed to repo).

- Atom kinds: command / query / stream / transform. Combined with
  side_effects[] and idempotent flag, runtime decides retry safety,
  parallelism, caching.

- Naming: <crate>::<verb> globally unique. Rust :: separator keeps it
  native-feeling.

- Versioning: atoms inherit crate SemVer. Breaking change to an atom =
  new atom (create-v2), old marked deprecated.

- Runtime contract: `kei-runtime invoke <atom-id> --input <json>` with
  schema validation at entry + exit, ledger row per invocation.

- Graph contract: kei-sage auto-walks atoms/*.md, resolves [[atom-id]]
  wikilinks, exposes rank / related / search / graph over atom corpus.

- UI contract: kei-forge web wizard generates .md + .json + .rs + test
  from form input; postcondition cargo check + kei-schema-lint pass.

Document declares 4 stream interfaces explicitly — each stream knows
what it reads from this schema, what it writes, what it does NOT depend
on from other streams. Enables true parallel work.

6 open questions flagged for user review at bottom:
1) JSON Schema draft-07 vs 2020-12
2) Atom ID separator :: vs /
3) side_effects strings vs structured
4) capabilities.toml committed vs gitignored
5) kei-atom-template in this PR or defer to Stream A
6) Error model per-atom vs shared registry

STATUS: DRAFT — awaits user approval + SCHEMA-LOCKED.md marker before
parallel streams start. Once locked, breaking changes require explicit
revocation + all-streams sync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 23:39:24 +08:00

15 KiB

KeiSeiKit Substrate Schema v1

STATUS: Draft — under review. Once approved, this document is LOCKED for 6 weeks of parallel stream work (RULE: breaking changes require explicit user revocation + all-streams sync).

PURPOSE: Single Source of Truth for the atom / capability / graph schema that enables the substrate composition layer. Four parallel work streams (UI / Atoms refactor / Graph / Runtime) all depend on this contract.


Core concept: atom = one verb

An atom is one verb (one operation) on a primitive, not one crate. Example: kei-task crate decomposes into kei-task::create, kei-task::add-dependency, kei-task::search, … Each atom is independently:

  • Documented (one .md file)
  • Schema-specified (JSON Schema for input + output)
  • Callable (one Rust function)
  • Discoverable (aggregated into capabilities.toml)
  • Composable (runtime pipes atoms by schema compatibility)

Granularity target: ~150 atoms across the current 25 crates. Crate = physical container; atom = unit of composition.


File layout per crate

_primitives/_rust/<crate>/
├── Cargo.toml
├── capabilities.toml           ← AUTO-GENERATED from atoms/*.md frontmatter
│                                  (build.rs runs on cargo build; commit the
│                                   generated file so CI consumers see it)
├── src/
│   ├── main.rs                 ← CLI dispatcher — parses argv, calls atom fn
│   ├── atoms/
│   │   ├── mod.rs
│   │   ├── create.rs           ← one file per atom impl, pub fn run(input: ...) -> ...
│   │   ├── add_dependency.rs
│   │   └── search.rs
│   └── schema.rs               ← Rust types that match JSON Schemas
├── atoms/                       ← HUMAN-FACING docs, machine-parseable frontmatter
│   ├── create.md
│   ├── add-dependency.md
│   ├── search.md
│   └── schemas/
│       ├── create-input.json        ← JSON Schema draft-07
│       ├── create-output.json
│       ├── add-dependency-input.json
│       └── …
└── migrations/                  ← per-crate SQLite migrations (kei-migrate)
    └── 0001_initial.sql

Why split src/atoms/ and atoms/: code lives with code (Rust convention), docs live in a flat directory easy for kei-sage to walk and for humans to scan.


Atom .md frontmatter schema

Every atoms/<verb>.md file MUST begin with YAML frontmatter matching this shape:

---
# REQUIRED
atom: kei-task::create              # <crate>::<verb> — globally unique ID
kind: command                       # command | query | stream | transform
version: "0.22.3"                   # inherits crate Cargo.toml version

# INPUT / OUTPUT — schemas live in atoms/schemas/ (relative paths)
input:
  schema: schemas/create-input.json
  required: [title]                 # convenience duplication from JSON Schema for CLI help
  example: { title: "Fix auth bug", priority: "high" }

output:
  schema: schemas/create-output.json
  example: { id: 42, created_at: "2026-04-22T15:30:00Z" }

# ERRORS — typed, documented upfront
errors:
  - code: DuplicateTitle
    http_analog: 409
    description: "A task with this title already exists under the same milestone"
  - code: InvalidPriority
    http_analog: 400
    description: "Priority must be one of: low, medium, high"

# SUBSTRATE HINTS — runtime uses these for DAG composition safety
side_effects:                        # [] means pure/readonly
  - "write:kei-task-db"              # domain-prefixed; kei-db-<name> for shared, custom for crate-private
idempotent: false                    # safe to retry? affects runtime retry logic
timeout_ms: 5000                     # default timeout; runtime enforces

# LIFECYCLE
deprecated: null                     # or: "use kei-task::create-v2 — stricter validation"
stability: stable                    # experimental | beta | stable | deprecated

# DISCOVERY
keywords: [task, todo, gtd, planning]
related:                             # wikilinks rendered by kei-sage
  - "[[kei-task::add-dependency]]"
  - "[[kei-milestone::link]]"
---

Body (Markdown, free-form)

After frontmatter, the body is human-facing with fixed section conventions:

# kei-task::create

Creates a new task in the DAG. Title must be unique within its milestone scope.

## Example

    kei-task create \
      --title "Fix auth bug" \
      --priority high \
      --description "Token rotation fails on leap second"

Returns JSON: `{"id": 42, "created_at": "2026-04-22T..."}`

## Gotchas

- Title uniqueness is per-milestone, NOT global. Two tasks `"Fix bug"` in
  different milestones is valid.
- `priority` is case-sensitive — `High` returns `InvalidPriority`.

## Related
- [[kei-task::add-dependency]] — link this task into DAG as parent/child
- [[kei-milestone::link]] — group this task under a milestone
- [[rules/RULE 0.12]] — task DAG per Agent Git Model

Sections # <atom-id>, ## Example, ## Gotchas, ## Related are convention, not requirement — but recommended for uniformity so kei-sage can extract sections predictably.


capabilities.toml — per-crate aggregator

Auto-generated from all atoms/*.md frontmatter by build.rs. Committed to repo so downstream consumers (kei-sage, kei-runtime, kei-forge) don't need to parse YAML.

[primitive]
name = "kei-task"
version = "0.22.3"
crate_path = "_primitives/_rust/kei-task"
description = "SQLite-backed task DAG with dependencies, milestones, FTS search"

[state]
# State declaration — runtime + kei-forge need this to know where data lives
backend = "sqlite"                                 # sqlite | filesystem | memory | remote
db_env = "KEI_TASK_DB"
db_default = "~/.claude/task/task.sqlite"
migrations_dir = "migrations/"
schema_version = 3

[[atoms]]
name = "create"
full_id = "kei-task::create"
kind = "command"
md_path = "atoms/create.md"
input_schema = "atoms/schemas/create-input.json"
output_schema = "atoms/schemas/create-output.json"
side_effects = ["write:kei-task-db"]
idempotent = false
timeout_ms = 5000
stability = "stable"

[[atoms]]
name = "add-dependency"
full_id = "kei-task::add-dependency"
kind = "command"
md_path = "atoms/add-dependency.md"
# … etc

[[atoms]]
name = "search"
full_id = "kei-task::search"
kind = "query"
side_effects = []           # empty = pure read
idempotent = true
# …

Validation: kei-schema-lint (new tool in Runtime stream) checks capabilities.toml is consistent with atoms/*.md frontmatter on every CI run.


JSON Schema conventions (input / output)

  • Draft: JSON Schema draft-07 (widely supported, jsonschema + schemars Rust crates).
  • File naming: <verb>-input.json, <verb>-output.json.
  • Shared types: put under atoms/schemas/_shared/<Type>.json, reference via $ref.
  • Examples: every schema MUST have examples: [...] (used by kei-forge live preview + runtime smoke tests).

Minimal example — atoms/schemas/create-input.json:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "kei-task/atoms/schemas/create-input.json",
  "title": "kei-task::create input",
  "type": "object",
  "required": ["title"],
  "properties": {
    "title": { "type": "string", "minLength": 1, "maxLength": 200 },
    "priority": { "type": "string", "enum": ["low", "medium", "high"] },
    "description": { "type": "string" },
    "milestone_id": { "type": "integer", "minimum": 1 }
  },
  "additionalProperties": false,
  "examples": [
    { "title": "Fix auth bug", "priority": "high" }
  ]
}

Atom kinds (the 4 allowed values)

kind Meaning Pipe safety
command Mutates state (write DB, send request) Sequential only; runtime rejects parallel if overlapping side_effects
query Read-only (FTS, lookup) Parallel-safe
stream Emits a sequence over time (SSE, file tail) Single consumer per invocation
transform Pure function (input → output, no state) Parallel-safe, cacheable

Runtime uses kind + side_effects + idempotent to decide:

  • Can this atom be retried on failure? (needs idempotent: true OR kind=query|transform)
  • Can this atom be parallelized with another? (non-overlapping side_effects + both commands OR at least one query|transform)
  • Should output be cached? (transform with same input = deterministic)

Naming conventions

Thing Convention Example
Crate name kei-<noun> kebab-case kei-task
Atom verb lowercase, kebab-case, single word if possible create, add-dependency, search
Full atom ID <crate>::<verb> kei-task::add-dependency
Side-effect domain <op>:<domain> write:kei-task-db, read:fs, network:anthropic-api
Error code PascalCase DuplicateTitle, InvalidPriority
JSON Schema file <verb>-{input,output}.json create-input.json

Versioning & deprecation

  • Atoms inherit crate SemVer. kei-task::create version = kei-task Cargo.toml version.
  • Breaking change to an atom (signature change, required field added, error semantics shifted) = new atom with suffix: create-v2. Old atom gets deprecated: "use kei-task::create-v2" frontmatter.
  • Deprecated atoms stay functional for ≥ 2 minor versions, then removed.
  • Non-breaking changes (new optional input field, new output field, new error code): bump patch version, no rename.

Runtime invocation contract

The Runtime stream implements kei-runtime that exposes:

# Invoke one atom
kei-runtime invoke kei-task::create --input '{"title":"Fix bug"}'
# → stdout: {"result": {...}, "metadata": {"duration_ms": 12, "atom": "kei-task::create"}}
# → exit 0 on success, 2 on atom error (see frontmatter errors[]), 1 on usage/IO

# Invoke a DAG
kei-runtime pipe dag.toml
# dag.toml declares:
#   [[steps]]
#   atom = "kei-task::create"
#   input = { title = "X" }
#   capture_as = "task"
#
#   [[steps]]
#   atom = "kei-task::add-dependency"
#   input = { parent = "$task.id", child = 17 }

# Discover what's installed
kei-runtime list-atoms [--kind command|query|] [--crate kei-task]

Runtime validates at invocation: input against input_schema, output against output_schema. Mismatch = exit 2 with schema-violation error.

Runtime records to kei-ledger: every invocation emits a ledger row (atom-id, spec-sha, input-sha, duration, exit, errors). Same RULE 0.12 lifecycle as agent forks.


Graph / discovery contract

The Graph stream (kei-sage as substrate) exposes:

kei-sage rank-atoms                         # PageRank over [[atom-id]] wikilinks
kei-sage related kei-task::create           # BFS from atom
kei-sage search "task create"               # FTS over atom bodies + frontmatter
kei-sage graph kei-task::create --depth=2   # GraphML export

kei-sage auto-imports on install:

  1. Walks ~/.claude/agents/_primitives/_rust/*/atoms/*.md
  2. Parses frontmatter + body
  3. Resolves [[atom-id]] wikilinks to atom nodes
  4. Resolves [[rules/RULE 0.X]] wikilinks to rule file nodes
  5. Re-indexes on file modification (inotify / fsevents)

UI (kei-forge) contract

The UI stream generates new atoms via web wizard (keisei forge):

Inputs from user (form):

  • Crate (existing or new)
  • Atom verb name (kebab-case)
  • Kind (command / query / stream / transform)
  • Input fields (JSON Schema builder UI)
  • Output fields
  • Error codes
  • Side effects

Outputs (generated on submit):

  • atoms/<verb>.md with frontmatter + skeleton body
  • atoms/schemas/<verb>-input.json + <verb>-output.json
  • src/atoms/<verb>.rs with pub fn run(input: …) -> Result<Output, Error> skeleton
  • Test file tests/<verb>_smoke.rs
  • Regenerated capabilities.toml

Postcondition: cargo check passes, kei-schema-lint passes, new atom visible to kei-runtime list-atoms.


Stream interfaces (the 4 contracts)

Here is exactly what each parallel stream can assume from this schema:

Stream A — UI (kei-forge)

  • Reads: this schema doc, JSON Schema draft-07, existing atoms/*.md as templates
  • Writes: generates new .md + .json + .rs per above contract
  • Does NOT depend on: Atoms-refactor (can work against any single atom template), Graph (independent), Runtime (independent)

Stream B — Atoms refactor

  • Reads: current 25 crates
  • Writes: atoms/<verb>.md + atoms/schemas/*.json + splits src/main.rssrc/atoms/*.rs, generates capabilities.toml via build.rs
  • Does NOT depend on: UI (can progress independently), Graph, Runtime

Stream C — Graph (kei-sage substrate)

  • Reads: ~/.claude/agents/_primitives/_rust/*/atoms/*.md (real or test fixtures)
  • Writes: extends kei-sage to auto-walk the atom corpus, resolves [[atom-id]] wikilinks, exposes rank/related/search/graph over atoms
  • Does NOT depend on: UI; depends on Atoms stream ONLY for real test corpus (can ship against fixture .md files if Atoms not done)

Stream D — Runtime (kei-runtime, NEW crate)

  • Reads: capabilities.toml files + JSON Schema files
  • Writes: new crate _primitives/_rust/kei-runtime/ with invoke, pipe, list-atoms, kei-schema-lint
  • Does NOT depend on: UI, Graph. Depends on Atoms stream ONLY for real atoms (can ship against hand-crafted test atom for initial dev)

What this schema deliberately leaves open

Things NOT specified here — intentionally left for streams to decide:

  1. Exact YAML library (serde_yaml vs yaml-rust vs …) — Rust convention choice
  2. Build.rs mechanics for capabilities.toml generation — implementation detail
  3. Web UI framework for kei-forge (HTMX / Leptos / Yew) — Stream A's call
  4. Runtime concurrency model (async tokio / sync threads / subprocess) — Stream D's call
  5. kei-sage GraphML vs Mermaid vs DOT output format — Stream C's call
  6. Atom test harness shape — streams B + D coordinate

Schema lock declaration

Once this document is approved by the user and a SCHEMA-LOCKED.md marker is committed, the schema is immutable for 6 weeks of parallel work. Breaking changes during lock period require:

  1. Explicit revocation by user
  2. All 4 stream agents paused + sync commit rebasing all streams to new schema
  3. kei-ledger entry: reason + revocation timestamp

Non-breaking additions (new optional fields, new atom kinds, new side-effect domains) are allowed during lock with standard git flow.

Open questions for review

Before we lock, call out things that might be wrong:

  1. JSON Schema draft-07 vs 2020-12? I picked draft-07 for Rust crate support. If you prefer 2020-12, say so.
  2. Atom ID format <crate>::<verb> — OK with :: separator? Alternative: <crate>/<verb> (path-like). :: is Rust-native which I prefer.
  3. side_effects as string tags vs structured? I went simple — "write:kei-task-db" is a string, parsed by runtime. Alternative: { op: "write", domain: "kei-task-db" } structured. Simpler string wins for now unless you object.
  4. capabilities.toml — auto-generated + committed? OR generated on demand + .gitignore'd? I went committed (downstream sees without rebuild). Tell me if you want generated-only.
  5. Should I ship a kei-atom-template/ in this PR too? OR leave that for Stream A (kei-forge) to own?
  6. Error model: typed per atom (current draft) vs shared error registry? Current is simpler; shared registry would enable runtime to map errors uniformly across atoms. Your call.