From 58944e15bd900fb1c4576b2e4317afc38cc6ee12 Mon Sep 17 00:00:00 2001 From: Parfii-bot Date: Wed, 22 Apr 2026 23:39:24 +0800 Subject: [PATCH] =?UTF-8?q?docs(substrate):=20v1=20atom/capability/graph?= =?UTF-8?q?=20SSoT=20schema=20=E2=80=94=20DRAFT=20for=20review?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Substrate thesis requires a single source of truth before parallel work streams (UI/Atoms/Graph/Runtime) can proceed independently without drift. This document is that SSoT. Key decisions baked in (open to revision before lock): - Atom = one verb on a primitive, not one crate. Target ~150 atoms across current 25 crates. Crate = physical container, atom = unit of composition. - File layout: src/atoms/.rs (code) + atoms/.md (docs with machine-parseable YAML frontmatter) + atoms/schemas/*.json (JSON Schema draft-07 for input/output) + capabilities.toml (auto-generated aggregator, committed to repo). - Atom kinds: command / query / stream / transform. Combined with side_effects[] and idempotent flag, runtime decides retry safety, parallelism, caching. - Naming: :: globally unique. Rust :: separator keeps it native-feeling. - Versioning: atoms inherit crate SemVer. Breaking change to an atom = new atom (create-v2), old marked deprecated. - Runtime contract: `kei-runtime invoke --input ` with schema validation at entry + exit, ledger row per invocation. - Graph contract: kei-sage auto-walks atoms/*.md, resolves [[atom-id]] wikilinks, exposes rank / related / search / graph over atom corpus. - UI contract: kei-forge web wizard generates .md + .json + .rs + test from form input; postcondition cargo check + kei-schema-lint pass. Document declares 4 stream interfaces explicitly — each stream knows what it reads from this schema, what it writes, what it does NOT depend on from other streams. Enables true parallel work. 6 open questions flagged for user review at bottom: 1) JSON Schema draft-07 vs 2020-12 2) Atom ID separator :: vs / 3) side_effects strings vs structured 4) capabilities.toml committed vs gitignored 5) kei-atom-template in this PR or defer to Stream A 6) Error model per-atom vs shared registry STATUS: DRAFT — awaits user approval + SCHEMA-LOCKED.md marker before parallel streams start. Once locked, breaking changes require explicit revocation + all-streams sync. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/SUBSTRATE-SCHEMA.md | 392 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 392 insertions(+) create mode 100644 docs/SUBSTRATE-SCHEMA.md diff --git a/docs/SUBSTRATE-SCHEMA.md b/docs/SUBSTRATE-SCHEMA.md new file mode 100644 index 0000000..99319e6 --- /dev/null +++ b/docs/SUBSTRATE-SCHEMA.md @@ -0,0 +1,392 @@ +# KeiSeiKit Substrate Schema v1 + +**STATUS:** Draft — under review. Once approved, this document is **LOCKED** for 6 weeks of parallel stream work (RULE: breaking changes require explicit user revocation + all-streams sync). + +**PURPOSE:** Single Source of Truth for the atom / capability / graph schema that enables the substrate composition layer. Four parallel work streams (UI / Atoms refactor / Graph / Runtime) all depend on this contract. + +--- + +## Core concept: atom = one verb + +An **atom** is **one verb** (one operation) on a primitive, not one crate. Example: `kei-task` crate decomposes into `kei-task::create`, `kei-task::add-dependency`, `kei-task::search`, … Each atom is independently: + +- Documented (one `.md` file) +- Schema-specified (JSON Schema for input + output) +- Callable (one Rust function) +- Discoverable (aggregated into `capabilities.toml`) +- Composable (runtime pipes atoms by schema compatibility) + +**Granularity target:** ~150 atoms across the current 25 crates. Crate = physical container; atom = unit of composition. + +--- + +## File layout per crate + +``` +_primitives/_rust// +├── Cargo.toml +├── capabilities.toml ← AUTO-GENERATED from atoms/*.md frontmatter +│ (build.rs runs on cargo build; commit the +│ generated file so CI consumers see it) +├── src/ +│ ├── main.rs ← CLI dispatcher — parses argv, calls atom fn +│ ├── atoms/ +│ │ ├── mod.rs +│ │ ├── create.rs ← one file per atom impl, pub fn run(input: ...) -> ... +│ │ ├── add_dependency.rs +│ │ └── search.rs +│ └── schema.rs ← Rust types that match JSON Schemas +├── atoms/ ← HUMAN-FACING docs, machine-parseable frontmatter +│ ├── create.md +│ ├── add-dependency.md +│ ├── search.md +│ └── schemas/ +│ ├── create-input.json ← JSON Schema draft-07 +│ ├── create-output.json +│ ├── add-dependency-input.json +│ └── … +└── migrations/ ← per-crate SQLite migrations (kei-migrate) + └── 0001_initial.sql +``` + +**Why split `src/atoms/` and `atoms/`:** code lives with code (Rust convention), docs live in a flat directory easy for kei-sage to walk and for humans to scan. + +--- + +## Atom `.md` frontmatter schema + +Every `atoms/.md` file MUST begin with YAML frontmatter matching this shape: + +```yaml +--- +# REQUIRED +atom: kei-task::create # :: — globally unique ID +kind: command # command | query | stream | transform +version: "0.22.3" # inherits crate Cargo.toml version + +# INPUT / OUTPUT — schemas live in atoms/schemas/ (relative paths) +input: + schema: schemas/create-input.json + required: [title] # convenience duplication from JSON Schema for CLI help + example: { title: "Fix auth bug", priority: "high" } + +output: + schema: schemas/create-output.json + example: { id: 42, created_at: "2026-04-22T15:30:00Z" } + +# ERRORS — typed, documented upfront +errors: + - code: DuplicateTitle + http_analog: 409 + description: "A task with this title already exists under the same milestone" + - code: InvalidPriority + http_analog: 400 + description: "Priority must be one of: low, medium, high" + +# SUBSTRATE HINTS — runtime uses these for DAG composition safety +side_effects: # [] means pure/readonly + - "write:kei-task-db" # domain-prefixed; kei-db- for shared, custom for crate-private +idempotent: false # safe to retry? affects runtime retry logic +timeout_ms: 5000 # default timeout; runtime enforces + +# LIFECYCLE +deprecated: null # or: "use kei-task::create-v2 — stricter validation" +stability: stable # experimental | beta | stable | deprecated + +# DISCOVERY +keywords: [task, todo, gtd, planning] +related: # wikilinks rendered by kei-sage + - "[[kei-task::add-dependency]]" + - "[[kei-milestone::link]]" +--- +``` + +### Body (Markdown, free-form) + +After frontmatter, the body is **human-facing** with fixed section conventions: + +```markdown +# kei-task::create + +Creates a new task in the DAG. Title must be unique within its milestone scope. + +## Example + + kei-task create \ + --title "Fix auth bug" \ + --priority high \ + --description "Token rotation fails on leap second" + +Returns JSON: `{"id": 42, "created_at": "2026-04-22T..."}` + +## Gotchas + +- Title uniqueness is per-milestone, NOT global. Two tasks `"Fix bug"` in + different milestones is valid. +- `priority` is case-sensitive — `High` returns `InvalidPriority`. + +## Related +- [[kei-task::add-dependency]] — link this task into DAG as parent/child +- [[kei-milestone::link]] — group this task under a milestone +- [[rules/RULE 0.12]] — task DAG per Agent Git Model +``` + +Sections `# `, `## Example`, `## Gotchas`, `## Related` are **convention, not requirement** — but recommended for uniformity so kei-sage can extract sections predictably. + +--- + +## `capabilities.toml` — per-crate aggregator + +Auto-generated from all `atoms/*.md` frontmatter by `build.rs`. Committed to repo so downstream consumers (kei-sage, kei-runtime, kei-forge) don't need to parse YAML. + +```toml +[primitive] +name = "kei-task" +version = "0.22.3" +crate_path = "_primitives/_rust/kei-task" +description = "SQLite-backed task DAG with dependencies, milestones, FTS search" + +[state] +# State declaration — runtime + kei-forge need this to know where data lives +backend = "sqlite" # sqlite | filesystem | memory | remote +db_env = "KEI_TASK_DB" +db_default = "~/.claude/task/task.sqlite" +migrations_dir = "migrations/" +schema_version = 3 + +[[atoms]] +name = "create" +full_id = "kei-task::create" +kind = "command" +md_path = "atoms/create.md" +input_schema = "atoms/schemas/create-input.json" +output_schema = "atoms/schemas/create-output.json" +side_effects = ["write:kei-task-db"] +idempotent = false +timeout_ms = 5000 +stability = "stable" + +[[atoms]] +name = "add-dependency" +full_id = "kei-task::add-dependency" +kind = "command" +md_path = "atoms/add-dependency.md" +# … etc + +[[atoms]] +name = "search" +full_id = "kei-task::search" +kind = "query" +side_effects = [] # empty = pure read +idempotent = true +# … +``` + +**Validation**: `kei-schema-lint` (new tool in Runtime stream) checks `capabilities.toml` is consistent with `atoms/*.md` frontmatter on every CI run. + +--- + +## JSON Schema conventions (input / output) + +- **Draft:** JSON Schema **draft-07** (widely supported, `jsonschema` + `schemars` Rust crates). +- **File naming:** `-input.json`, `-output.json`. +- **Shared types:** put under `atoms/schemas/_shared/.json`, reference via `$ref`. +- **Examples:** every schema MUST have `examples: [...]` (used by kei-forge live preview + runtime smoke tests). + +Minimal example — `atoms/schemas/create-input.json`: + +```json +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "$id": "kei-task/atoms/schemas/create-input.json", + "title": "kei-task::create input", + "type": "object", + "required": ["title"], + "properties": { + "title": { "type": "string", "minLength": 1, "maxLength": 200 }, + "priority": { "type": "string", "enum": ["low", "medium", "high"] }, + "description": { "type": "string" }, + "milestone_id": { "type": "integer", "minimum": 1 } + }, + "additionalProperties": false, + "examples": [ + { "title": "Fix auth bug", "priority": "high" } + ] +} +``` + +--- + +## Atom kinds (the 4 allowed values) + +| kind | Meaning | Pipe safety | +|---|---|---| +| `command` | Mutates state (write DB, send request) | Sequential only; runtime rejects parallel if overlapping `side_effects` | +| `query` | Read-only (FTS, lookup) | Parallel-safe | +| `stream` | Emits a sequence over time (SSE, file tail) | Single consumer per invocation | +| `transform` | Pure function (input → output, no state) | Parallel-safe, cacheable | + +**Runtime uses `kind` + `side_effects` + `idempotent`** to decide: +- Can this atom be retried on failure? (needs `idempotent: true` OR `kind=query|transform`) +- Can this atom be parallelized with another? (non-overlapping `side_effects` + both commands OR at least one `query|transform`) +- Should output be cached? (`transform` with same input = deterministic) + +--- + +## Naming conventions + +| Thing | Convention | Example | +|---|---|---| +| Crate name | `kei-` kebab-case | `kei-task` | +| Atom verb | lowercase, kebab-case, single word if possible | `create`, `add-dependency`, `search` | +| Full atom ID | `::` | `kei-task::add-dependency` | +| Side-effect domain | `:` | `write:kei-task-db`, `read:fs`, `network:anthropic-api` | +| Error code | PascalCase | `DuplicateTitle`, `InvalidPriority` | +| JSON Schema file | `-{input,output}.json` | `create-input.json` | + +--- + +## Versioning & deprecation + +- **Atoms inherit crate SemVer.** `kei-task::create` version = `kei-task` Cargo.toml version. +- **Breaking change to an atom** (signature change, required field added, error semantics shifted) = **new atom** with suffix: `create-v2`. Old atom gets `deprecated: "use kei-task::create-v2"` frontmatter. +- **Deprecated atoms** stay functional for ≥ 2 minor versions, then removed. +- **Non-breaking changes** (new optional input field, new output field, new error code): bump patch version, no rename. + +--- + +## Runtime invocation contract + +The Runtime stream implements `kei-runtime` that exposes: + +```bash +# Invoke one atom +kei-runtime invoke kei-task::create --input '{"title":"Fix bug"}' +# → stdout: {"result": {...}, "metadata": {"duration_ms": 12, "atom": "kei-task::create"}} +# → exit 0 on success, 2 on atom error (see frontmatter errors[]), 1 on usage/IO + +# Invoke a DAG +kei-runtime pipe dag.toml +# dag.toml declares: +# [[steps]] +# atom = "kei-task::create" +# input = { title = "X" } +# capture_as = "task" +# +# [[steps]] +# atom = "kei-task::add-dependency" +# input = { parent = "$task.id", child = 17 } + +# Discover what's installed +kei-runtime list-atoms [--kind command|query|…] [--crate kei-task] +``` + +**Runtime validates at invocation:** input against `input_schema`, output against `output_schema`. Mismatch = exit 2 with schema-violation error. + +**Runtime records to `kei-ledger`:** every invocation emits a ledger row (atom-id, spec-sha, input-sha, duration, exit, errors). Same RULE 0.12 lifecycle as agent forks. + +--- + +## Graph / discovery contract + +The Graph stream (kei-sage as substrate) exposes: + +```bash +kei-sage rank-atoms # PageRank over [[atom-id]] wikilinks +kei-sage related kei-task::create # BFS from atom +kei-sage search "task create" # FTS over atom bodies + frontmatter +kei-sage graph kei-task::create --depth=2 # GraphML export +``` + +`kei-sage` auto-imports on install: +1. Walks `~/.claude/agents/_primitives/_rust/*/atoms/*.md` +2. Parses frontmatter + body +3. Resolves `[[atom-id]]` wikilinks to atom nodes +4. Resolves `[[rules/RULE 0.X]]` wikilinks to rule file nodes +5. Re-indexes on file modification (inotify / fsevents) + +--- + +## UI (kei-forge) contract + +The UI stream generates new atoms via web wizard (`keisei forge`): + +**Inputs from user (form):** +- Crate (existing or new) +- Atom verb name (kebab-case) +- Kind (command / query / stream / transform) +- Input fields (JSON Schema builder UI) +- Output fields +- Error codes +- Side effects + +**Outputs (generated on submit):** +- `atoms/.md` with frontmatter + skeleton body +- `atoms/schemas/-input.json` + `-output.json` +- `src/atoms/.rs` with `pub fn run(input: …) -> Result` skeleton +- Test file `tests/_smoke.rs` +- Regenerated `capabilities.toml` + +**Postcondition:** `cargo check` passes, `kei-schema-lint` passes, new atom visible to `kei-runtime list-atoms`. + +--- + +## Stream interfaces (the 4 contracts) + +Here is exactly what each parallel stream can assume from this schema: + +### Stream A — UI (kei-forge) +- **Reads:** this schema doc, JSON Schema draft-07, existing `atoms/*.md` as templates +- **Writes:** generates new `.md` + `.json` + `.rs` per above contract +- **Does NOT depend on:** Atoms-refactor (can work against any single atom template), Graph (independent), Runtime (independent) + +### Stream B — Atoms refactor +- **Reads:** current 25 crates +- **Writes:** `atoms/.md` + `atoms/schemas/*.json` + splits `src/main.rs` → `src/atoms/*.rs`, generates `capabilities.toml` via build.rs +- **Does NOT depend on:** UI (can progress independently), Graph, Runtime + +### Stream C — Graph (kei-sage substrate) +- **Reads:** `~/.claude/agents/_primitives/_rust/*/atoms/*.md` (real or test fixtures) +- **Writes:** extends `kei-sage` to auto-walk the atom corpus, resolves `[[atom-id]]` wikilinks, exposes rank/related/search/graph over atoms +- **Does NOT depend on:** UI; depends on Atoms stream ONLY for real test corpus (can ship against fixture .md files if Atoms not done) + +### Stream D — Runtime (kei-runtime, NEW crate) +- **Reads:** `capabilities.toml` files + JSON Schema files +- **Writes:** new crate `_primitives/_rust/kei-runtime/` with `invoke`, `pipe`, `list-atoms`, `kei-schema-lint` +- **Does NOT depend on:** UI, Graph. Depends on Atoms stream ONLY for real atoms (can ship against hand-crafted test atom for initial dev) + +--- + +## What this schema deliberately leaves open + +Things NOT specified here — intentionally left for streams to decide: + +1. **Exact YAML library** (serde_yaml vs yaml-rust vs …) — Rust convention choice +2. **Build.rs mechanics** for capabilities.toml generation — implementation detail +3. **Web UI framework** for kei-forge (HTMX / Leptos / Yew) — Stream A's call +4. **Runtime concurrency model** (async tokio / sync threads / subprocess) — Stream D's call +5. **kei-sage GraphML vs Mermaid vs DOT** output format — Stream C's call +6. **Atom test harness** shape — streams B + D coordinate + +--- + +## Schema lock declaration + +Once this document is approved by the user and a `SCHEMA-LOCKED.md` marker is committed, the schema is **immutable for 6 weeks** of parallel work. Breaking changes during lock period require: + +1. Explicit revocation by user +2. All 4 stream agents paused + sync commit rebasing all streams to new schema +3. `kei-ledger` entry: reason + revocation timestamp + +Non-breaking additions (new optional fields, new atom kinds, new side-effect domains) are allowed during lock with standard git flow. + +## Open questions for review + +Before we lock, call out things that might be wrong: + +1. **JSON Schema draft-07 vs 2020-12?** I picked draft-07 for Rust crate support. If you prefer 2020-12, say so. +2. **Atom ID format `::` — OK with `::` separator?** Alternative: `/` (path-like). `::` is Rust-native which I prefer. +3. **`side_effects` as string tags vs structured?** I went simple — `"write:kei-task-db"` is a string, parsed by runtime. Alternative: `{ op: "write", domain: "kei-task-db" }` structured. Simpler string wins for now unless you object. +4. **`capabilities.toml` — auto-generated + committed?** OR generated on demand + `.gitignore`'d? I went committed (downstream sees without rebuild). Tell me if you want generated-only. +5. **Should I ship a `kei-atom-template/` in this PR too?** OR leave that for Stream A (kei-forge) to own? +6. **Error model: typed per atom (current draft) vs shared error registry?** Current is simpler; shared registry would enable runtime to map errors uniformly across atoms. Your call.