feat(substrate): apply user decisions + ship atom template + generator

Schema revisions per user review 2026-04-22 (all 6 open questions resolved
— see §Decision log in SUBSTRATE-SCHEMA.md):

- #3 side_effects: string tags → structured { op, domain } objects (user:
  "лучше сразу с запасом")
- #4 capabilities.toml: DROPPED entirely (user: "почему не мд?"). SSoT is
  atoms/*.md. Crate-level metadata moves to Cargo.toml
  [package.metadata.keisei] — Cargo-native, no drift, no build.rs, no
  generated files to commit. kei-sage + kei-runtime walk atoms/*.md
  directly.
- #5 atom template: shipped in this PR (user: "ui же параллельно! создавай
  все!") so Streams B/C/D can scaffold atoms from day 0 without waiting
  for Stream A (kei-forge UI).
- #1/#2/#6 confirmed as drafted (draft-07, `::` separator, per-atom errors).

New files:

- _templates/atom/ — 5-file template set with placeholder substitution
  (__CRATE__, __VERB__, __KIND__, __DESCRIPTION__ etc). Covers
  atoms/<verb>.md, schemas/<verb>-{input,output}.json, src/atoms/<verb>.rs,
  tests/<verb>_smoke.rs. Each file is a minimal working skeleton.
- scripts/new-atom.sh — POSIX bash generator (bash for $'\n' / readonly /
  trap). Validates verb is lowercase kebab-case, kind is one of
  command|query|stream|transform. Refuses to overwrite existing files.
  Rolls back on any failure (trap ERR deletes all generated files so no
  half-scaffolded state). Tested: produces 5 files, placeholder
  substitution correct on smoke-test crate.

Stream B (atoms refactor) updated to drop the "generates capabilities.toml
via build.rs" wording — now just "writes atoms/*.md + updates Cargo.toml
[package.metadata.keisei]". Stream D reads atoms/*.md + Cargo.toml, not
capabilities.toml.

Schema status: revisions applied, decision log complete. Ready for
SCHEMA-LOCKED.md marker commit once user signs off on revised doc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Parfii-bot 2026-04-22 23:53:26 +08:00
parent 58944e15bd
commit 559db303e1
8 changed files with 308 additions and 53 deletions

30
_templates/atom/README.md Normal file
View file

@ -0,0 +1,30 @@
# Atom template
Used by `scripts/new-atom.sh <crate> <verb> [kind]` to scaffold a new atom. Placeholder substitution map:
| Placeholder | Example | Source |
|---|---|---|
| `__CRATE__` | `kei-task` | argv 1 (kebab-case) |
| `__CRATE_SNAKE__` | `kei_task` | argv 1 → underscores |
| `__VERB__` | `add-dependency` | argv 2 (kebab-case) |
| `__VERB_SNAKE__` | `add_dependency` | argv 2 → underscores |
| `__KIND__` | `command` | argv 3 or default `command` |
| `__DESCRIPTION__` | free-form one-liner | prompted at runtime |
Schema SSoT: [SUBSTRATE-SCHEMA.md](../../docs/SUBSTRATE-SCHEMA.md).
Template covers the 4 files a new atom always needs:
- `atoms/<verb>.md` — human doc + YAML frontmatter (machine-parsed by kei-sage + kei-runtime)
- `atoms/schemas/<verb>-input.json` — JSON Schema draft-07
- `atoms/schemas/<verb>-output.json` — JSON Schema draft-07
- `src/atoms/<verb>.rs` — Rust impl skeleton with Input/Output/Error + `pub fn run`
- `tests/<verb>_smoke.rs` — smoke test placeholder
Postconditions the generator enforces:
1. `cargo check -p <crate>` passes (skeleton compiles)
2. `kei-schema-lint <crate>` passes (frontmatter + schema paths valid)
3. New atom appears in `kei-runtime list-atoms --crate <crate>`
If any postcondition fails, the generator rolls back (deletes the generated files) so there is no half-scaffolded state.

View file

@ -0,0 +1,42 @@
---
atom: __CRATE__::__VERB__
kind: __KIND__
version: "0.1.0"
input:
schema: schemas/__VERB__-input.json
required: []
example: {}
output:
schema: schemas/__VERB__-output.json
example: {}
errors: []
side_effects: []
idempotent: true
timeout_ms: 5000
deprecated: null
stability: experimental
keywords: []
related: []
---
# __CRATE__::__VERB__
__DESCRIPTION__
## Example
__CRATE__ __VERB__ [args...]
## Gotchas
_TODO: document non-obvious behaviour_
## Related
_TODO: add [[atom-id]] wikilinks to related atoms_

View file

@ -0,0 +1,9 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "__CRATE__/atoms/schemas/__VERB__-input.json",
"title": "__CRATE__::__VERB__ input",
"type": "object",
"properties": {},
"additionalProperties": false,
"examples": [{}]
}

View file

@ -0,0 +1,9 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "__CRATE__/atoms/schemas/__VERB__-output.json",
"title": "__CRATE__::__VERB__ output",
"type": "object",
"properties": {},
"additionalProperties": false,
"examples": [{}]
}

View file

@ -0,0 +1,28 @@
//! __CRATE__::__VERB__ atom implementation.
//!
//! See `atoms/__VERB__.md` for the human-facing spec and frontmatter.
//! See `atoms/schemas/__VERB__-{input,output}.json` for the wire shape.
use serde::{Deserialize, Serialize};
#[derive(Debug, Deserialize)]
pub struct Input {
// TODO: fields matching schemas/__VERB__-input.json
}
#[derive(Debug, Serialize)]
pub struct Output {
// TODO: fields matching schemas/__VERB__-output.json
}
#[derive(Debug, thiserror::Error)]
pub enum Error {
// TODO: error codes matching frontmatter `errors:` list
#[error("not implemented")]
NotImplemented,
}
/// Entry point — called by `src/main.rs` CLI dispatcher and by `kei-runtime invoke`.
pub fn run(_input: Input) -> Result<Output, Error> {
Err(Error::NotImplemented)
}

View file

@ -0,0 +1,13 @@
//! Smoke test for __CRATE__::__VERB__.
//!
//! Minimal happy-path + one error-path. Schema compliance is enforced by
//! `kei-schema-lint` separately; these tests verify actual function behaviour.
// TODO: replace path once crate is wired
// use __CRATE_SNAKE__::atoms::__VERB_SNAKE__::{run, Input};
#[test]
fn smoke_placeholder() {
// TODO: real test once `run` is implemented
assert!(true, "replace with real assertions before shipping");
}

View file

@ -1,6 +1,6 @@
# KeiSeiKit Substrate Schema v1
**STATUS:** Draft — under review. Once approved, this document is **LOCKED** for 6 weeks of parallel stream work (RULE: breaking changes require explicit user revocation + all-streams sync).
**STATUS:** Revised after user review (2026-04-22). Open questions resolved inline in §"Decision log" at bottom. Once `SCHEMA-LOCKED.md` marker is committed, this document is **LOCKED** for 6 weeks of parallel stream work (RULE: breaking changes require explicit user revocation + all-streams sync).
**PURPOSE:** Single Source of Truth for the atom / capability / graph schema that enables the substrate composition layer. Four parallel work streams (UI / Atoms refactor / Graph / Runtime) all depend on this contract.
@ -24,10 +24,9 @@ An **atom** is **one verb** (one operation) on a primitive, not one crate. Examp
```
_primitives/_rust/<crate>/
├── Cargo.toml
├── capabilities.toml ← AUTO-GENERATED from atoms/*.md frontmatter
│ (build.rs runs on cargo build; commit the
│ generated file so CI consumers see it)
├── Cargo.toml ← includes [package.metadata.keisei] for
│ crate-level substrate data (see §Cargo
│ metadata below)
├── src/
│ ├── main.rs ← CLI dispatcher — parses argv, calls atom fn
│ ├── atoms/
@ -36,7 +35,7 @@ _primitives/_rust/<crate>/
│ │ ├── add_dependency.rs
│ │ └── search.rs
│ └── schema.rs ← Rust types that match JSON Schemas
├── atoms/ ← HUMAN-FACING docs, machine-parseable frontmatter
├── atoms/ ← SSoT for atoms — docs + machine-parseable frontmatter
│ ├── create.md
│ ├── add-dependency.md
│ ├── search.md
@ -51,6 +50,8 @@ _primitives/_rust/<crate>/
**Why split `src/atoms/` and `atoms/`:** code lives with code (Rust convention), docs live in a flat directory easy for kei-sage to walk and for humans to scan.
**No `capabilities.toml` aggregator.** Per user review (2026-04-22): aggregated files cause drift vs source truth. `atoms/*.md` is the ONLY atom source. `kei-sage` walks `.md` files directly; `kei-runtime list-atoms` walks filesystem on demand. Crate-level metadata (db backend, env vars, migrations dir) lives in `Cargo.toml [package.metadata.keisei]` — already a first-class Cargo mechanism.
---
## Atom `.md` frontmatter schema
@ -85,7 +86,10 @@ errors:
# SUBSTRATE HINTS — runtime uses these for DAG composition safety
side_effects: # [] means pure/readonly
- "write:kei-task-db" # domain-prefixed; kei-db-<name> for shared, custom for crate-private
- { op: write, domain: kei-task-db } # structured — type-safe, extensible
- { op: read, domain: fs }
# op: read | write | network | subprocess | other
# domain: free-form, conventionally <crate-name>-db for DB / fs / <api-name>
idempotent: false # safe to retry? affects runtime retry logic
timeout_ms: 5000 # default timeout; runtime enforces
@ -135,54 +139,49 @@ Sections `# <atom-id>`, `## Example`, `## Gotchas`, `## Related` are **conventio
---
## `capabilities.toml` — per-crate aggregator
## Crate-level metadata — `Cargo.toml [package.metadata.keisei]`
Auto-generated from all `atoms/*.md` frontmatter by `build.rs`. Committed to repo so downstream consumers (kei-sage, kei-runtime, kei-forge) don't need to parse YAML.
Crate-level data (db backend, env vars, migrations) lives in a Cargo-native `[package.metadata.*]` section. Cargo reserves `[package.metadata.*]` explicitly for tool-specific extensions — no spec violation, no third-party file.
```toml
[primitive]
# _primitives/_rust/kei-task/Cargo.toml
[package]
name = "kei-task"
version = "0.22.3"
crate_path = "_primitives/_rust/kei-task"
description = "SQLite-backed task DAG with dependencies, milestones, FTS search"
# … rest of Cargo.toml unchanged
[state]
# State declaration — runtime + kei-forge need this to know where data lives
backend = "sqlite" # sqlite | filesystem | memory | remote
[package.metadata.keisei]
# Substrate declares crate-level state — atoms themselves are in atoms/*.md
backend = "sqlite" # sqlite | filesystem | memory | remote
db_env = "KEI_TASK_DB"
db_default = "~/.claude/task/task.sqlite"
migrations_dir = "migrations/"
schema_version = 3
[[atoms]]
name = "create"
full_id = "kei-task::create"
kind = "command"
md_path = "atoms/create.md"
input_schema = "atoms/schemas/create-input.json"
output_schema = "atoms/schemas/create-output.json"
side_effects = ["write:kei-task-db"]
idempotent = false
timeout_ms = 5000
stability = "stable"
[[atoms]]
name = "add-dependency"
full_id = "kei-task::add-dependency"
kind = "command"
md_path = "atoms/add-dependency.md"
# … etc
[[atoms]]
name = "search"
full_id = "kei-task::search"
kind = "query"
side_effects = [] # empty = pure read
idempotent = true
# …
```
**Validation**: `kei-schema-lint` (new tool in Runtime stream) checks `capabilities.toml` is consistent with `atoms/*.md` frontmatter on every CI run.
Atoms are discovered by walking `atoms/*.md` and parsing frontmatter. No aggregator file, no build.rs regeneration, no drift.
**Discovery:**
```bash
# Runtime lists atoms — walks filesystem on demand (~ms for 150 atoms)
kei-runtime list-atoms [--crate kei-task] [--kind command]
# → reads atoms/*.md frontmatter across ~/.claude/agents/_primitives/_rust/*/
# Sage indexes atoms — walks on install + inotify rebuild on change
kei-sage rank-atoms
# → same corpus, persisted to ~/.claude/sage/vault.sqlite for FTS + PageRank
```
**Validation**: `kei-schema-lint` (new tool in Runtime stream) validates:
1. Every `atoms/*.md` has valid frontmatter matching the schema above
2. Every `schema` path in frontmatter points to an existing JSON Schema file
3. Every `[[related]]` wikilink target exists (atom or rule)
4. `Cargo.toml [package.metadata.keisei]` has required fields
Runs in CI per-crate + globally across all installed primitives.
---
@ -342,8 +341,8 @@ Here is exactly what each parallel stream can assume from this schema:
### Stream B — Atoms refactor
- **Reads:** current 25 crates
- **Writes:** `atoms/<verb>.md` + `atoms/schemas/*.json` + splits `src/main.rs``src/atoms/*.rs`, generates `capabilities.toml` via build.rs
- **Does NOT depend on:** UI (can progress independently), Graph, Runtime
- **Writes:** `atoms/<verb>.md` + `atoms/schemas/*.json` + splits `src/main.rs``src/atoms/*.rs`, adds `[package.metadata.keisei]` to each `Cargo.toml`
- **Does NOT depend on:** UI (can progress independently), Graph, Runtime. No build.rs, no generated files — atoms/*.md is SSoT.
### Stream C — Graph (kei-sage substrate)
- **Reads:** `~/.claude/agents/_primitives/_rust/*/atoms/*.md` (real or test fixtures)
@ -351,7 +350,7 @@ Here is exactly what each parallel stream can assume from this schema:
- **Does NOT depend on:** UI; depends on Atoms stream ONLY for real test corpus (can ship against fixture .md files if Atoms not done)
### Stream D — Runtime (kei-runtime, NEW crate)
- **Reads:** `capabilities.toml` files + JSON Schema files
- **Reads:** `atoms/*.md` frontmatter + JSON Schema files + `Cargo.toml [package.metadata.keisei]`
- **Writes:** new crate `_primitives/_rust/kei-runtime/` with `invoke`, `pipe`, `list-atoms`, `kei-schema-lint`
- **Does NOT depend on:** UI, Graph. Depends on Atoms stream ONLY for real atoms (can ship against hand-crafted test atom for initial dev)
@ -380,13 +379,15 @@ Once this document is approved by the user and a `SCHEMA-LOCKED.md` marker is co
Non-breaking additions (new optional fields, new atom kinds, new side-effect domains) are allowed during lock with standard git flow.
## Open questions for review
## Decision log — resolved 2026-04-22
Before we lock, call out things that might be wrong:
| # | Question | Decision | Rationale |
|---|---|---|---|
| 1 | JSON Schema draft-07 vs 2020-12 | **draft-07** | Stable, every Rust crate supports. Migration later = sed + bump validator lib, not catastrophic. |
| 2 | Atom ID separator `::` vs `/` | **`::`** | Rust-native (`std::fs::read`). Cost: quoting in shell (`"kei-task::create"`). Accepted. |
| 3 | `side_effects` string vs structured object | **structured `{ op, domain }`** | Type-safe, adds 3rd field later without migration. "С запасом." |
| 4 | `capabilities.toml` committed vs gitignored | **DROP entirely** | Aggregator = drift risk + double maintenance. SSoT is `atoms/*.md`. Crate-level metadata moves to `Cargo.toml [package.metadata.keisei]` (Cargo-native mechanism). kei-sage + kei-runtime walk filesystem directly. |
| 5 | `kei-atom-template/` in this PR or defer to Stream A | **Include in this PR** | Template + `scripts/new-atom.sh` ships together with schema. Streams B/C/D can test-drive atom creation from day 0 without waiting for UI. UI (Stream A) wraps the same template in web wizard. |
| 6 | Error model per-atom vs shared registry | **Per-atom** | Simpler to start. Registry can be added later non-breakingly. |
1. **JSON Schema draft-07 vs 2020-12?** I picked draft-07 for Rust crate support. If you prefer 2020-12, say so.
2. **Atom ID format `<crate>::<verb>` — OK with `::` separator?** Alternative: `<crate>/<verb>` (path-like). `::` is Rust-native which I prefer.
3. **`side_effects` as string tags vs structured?** I went simple — `"write:kei-task-db"` is a string, parsed by runtime. Alternative: `{ op: "write", domain: "kei-task-db" }` structured. Simpler string wins for now unless you object.
4. **`capabilities.toml` — auto-generated + committed?** OR generated on demand + `.gitignore`'d? I went committed (downstream sees without rebuild). Tell me if you want generated-only.
5. **Should I ship a `kei-atom-template/` in this PR too?** OR leave that for Stream A (kei-forge) to own?
6. **Error model: typed per atom (current draft) vs shared error registry?** Current is simpler; shared registry would enable runtime to map errors uniformly across atoms. Your call.
**Locked values:** all of the above. Breaking changes to any of these during 6-week parallel window require explicit user revocation + all-streams sync + ledger row.

123
scripts/new-atom.sh Executable file
View file

@ -0,0 +1,123 @@
#!/usr/bin/env bash
# new-atom.sh — scaffold a new atom per SUBSTRATE-SCHEMA.md
#
# Usage:
# scripts/new-atom.sh <crate> <verb> [kind]
#
# Example:
# scripts/new-atom.sh kei-task add-dependency command
#
# Kinds (per schema §Atom kinds): command | query | stream | transform
# Default kind: command
set -euo pipefail
CRATE="${1:?usage: new-atom.sh <crate> <verb> [kind]}"
VERB="${2:?usage: new-atom.sh <crate> <verb> [kind]}"
KIND="${3:-command}"
# Validate kind against schema
case "$KIND" in
command|query|stream|transform) ;;
*) echo "error: kind must be one of: command, query, stream, transform" >&2; exit 1 ;;
esac
# Validate verb naming (kebab-case, lowercase)
if ! [[ "$VERB" =~ ^[a-z][a-z0-9]*(-[a-z0-9]+)*$ ]]; then
echo "error: verb must be lowercase kebab-case (got '$VERB')" >&2
exit 1
fi
# Repo root = two dirs up from this script
ROOT="$(cd "$(dirname "$0")/.." && pwd)"
CRATE_DIR="$ROOT/_primitives/_rust/$CRATE"
TEMPLATE_DIR="$ROOT/_templates/atom"
if [ ! -d "$CRATE_DIR" ]; then
echo "error: crate directory not found: $CRATE_DIR" >&2
echo "hint: create the crate first (e.g. via 'cargo new --lib $CRATE_DIR')" >&2
exit 1
fi
VERB_SNAKE="${VERB//-/_}"
CRATE_SNAKE="${CRATE//-/_}"
# Target files
MD_OUT="$CRATE_DIR/atoms/$VERB.md"
IN_OUT="$CRATE_DIR/atoms/schemas/$VERB-input.json"
OUT_OUT="$CRATE_DIR/atoms/schemas/$VERB-output.json"
RS_OUT="$CRATE_DIR/src/atoms/$VERB_SNAKE.rs"
TEST_OUT="$CRATE_DIR/tests/${VERB_SNAKE}_smoke.rs"
# Refuse to overwrite
for f in "$MD_OUT" "$IN_OUT" "$OUT_OUT" "$RS_OUT" "$TEST_OUT"; do
if [ -e "$f" ]; then
echo "error: file already exists: $f" >&2
echo "hint: pick a different verb, or delete the existing file first" >&2
exit 1
fi
done
# Prompt for description (stdin-friendly, non-interactive if piped)
if [ -t 0 ]; then
read -rp "One-line description: " DESCRIPTION
else
DESCRIPTION="${ATOM_DESCRIPTION:-TODO: add description}"
fi
# Escape for sed — forward-slash is our delimiter; strip any the user typed
DESCRIPTION_ESCAPED="${DESCRIPTION//\//\\/}"
mkdir -p "$CRATE_DIR/atoms/schemas" "$CRATE_DIR/src/atoms" "$CRATE_DIR/tests"
# Track what we wrote so we can roll back on failure
CREATED=()
substitute() {
local src="$1" dest="$2"
sed \
-e "s/__CRATE__/$CRATE/g" \
-e "s/__CRATE_SNAKE__/$CRATE_SNAKE/g" \
-e "s/__VERB__/$VERB/g" \
-e "s/__VERB_SNAKE__/$VERB_SNAKE/g" \
-e "s/__KIND__/$KIND/g" \
-e "s/__DESCRIPTION__/$DESCRIPTION_ESCAPED/g" \
"$src" > "$dest"
CREATED+=("$dest")
}
rollback() {
echo "rolling back — removing ${#CREATED[@]} generated files..." >&2
for f in "${CREATED[@]}"; do
rm -f "$f"
done
}
trap rollback ERR
substitute "$TEMPLATE_DIR/atoms/__VERB__.md.template" "$MD_OUT"
substitute "$TEMPLATE_DIR/atoms/schemas/__VERB__-input.json.template" "$IN_OUT"
substitute "$TEMPLATE_DIR/atoms/schemas/__VERB__-output.json.template" "$OUT_OUT"
substitute "$TEMPLATE_DIR/src/atoms/__VERB_SNAKE__.rs.template" "$RS_OUT"
substitute "$TEMPLATE_DIR/tests/__VERB_SNAKE___smoke.rs.template" "$TEST_OUT"
# Registering the atom module in src/atoms/mod.rs is left to Stream B
# refactor — on a freshly templated crate, src/atoms/mod.rs may not exist
# yet. The generator refuses to guess where to append.
trap - ERR
echo ""
echo "✓ Scaffolded atom $CRATE::$VERB ($KIND)"
echo ""
echo "Files created:"
for f in "${CREATED[@]}"; do
echo " ${f#$ROOT/}"
done
echo ""
echo "Next steps:"
echo " 1. Edit atoms/$VERB.md — fill description, examples, related[] wikilinks"
echo " 2. Edit atoms/schemas/$VERB-{input,output}.json — declare actual fields"
echo " 3. Implement src/atoms/$VERB_SNAKE.rs — replace NotImplemented with real logic"
echo " 4. Register: add 'pub mod $VERB_SNAKE;' to src/atoms/mod.rs"
echo " 5. cargo check -p $CRATE"
echo " 6. (once kei-schema-lint ships) kei-schema-lint $CRATE"