External reviewer raised 7 overclaim/scope concerns. Agents verified each
against source; this commit applies all fixes that landed in docs.
Honesty pass:
- README:25-29 — Cortex daemon track listed as alpha (was beta); MCP server
marked "alpha (unpublished) — install via local dist build"; Phase B
noted "auto-codification not yet wired (manual via /escalate-recurrence)";
keigit framed as author-operated mirror (KeiSei84 / private Forgejo),
not neutral community service
- README:95-97 — Cortex CLI/daemon track downgraded beta→alpha
with rationale (browser-app + VSCode-extension are concept-level)
- docs/ARCHITECTURE.md — added "Model router — current state (2026-05-03)"
subsection: per-call fixed estimate routing, NO 100-row Bayesian threshold
in current source (select.rs:74-124); reviewer suggestion deferred
- docs/SLEEP-LAYER.md — added Phase B scope clarification: morning report
is read-only markdown, no auto-codification path
- docs/PUBLISHING.md — aligned framing with README:43 ("author-operated
mirror" not "community registry"); added vendor-neutrality note that
substrate works against any npm-compatible registry
- mcp-server/package.json — added "private": true and description note
to prevent accidental publish before maturity gate
Portable format specs (reviewer asked for memory-repo agnosticism):
- docs/MEMORY-FORMAT.md (196 LOC) — JSONL schemas for traces / decisions /
agent-events with jq/awk/pandas recipes, grounded in actual writers
- docs/DNA-FORMAT.md (159 LOC) — DNA wire format ("type::caps::sha8")
with shell+python parsers
- docs/LEDGER-SCHEMA.md (199 LOC) — full SQLite DDL (agents +
skill_invocations + indexes + triggers) with sample queries
Auto-regen artifact:
- docs/DNA-INDEX.md — kei-registry regenerated count 564→565
Verification:
- All claims traced to file:line in source by agent a52b29ae
- All new docs ≤200 LOC per Constructor Pattern
- Reality verification verdicts: README/MCP/Phase-B/Cortex VERIFIED;
Bayesian-router PARTIAL (overclaim removed); keigit PARTIAL (framing
fixed in this commit); memory-format VERIFIED-FALSE (spec added)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.8 KiB
DNA-FORMAT — Portable Specification
How to parse and compute DNA strings without compiling any Rust. SSoT:
_primitives/_rust/kei-shared/src/dna.rs+kei-runtime-core/src/dna.rs(2026-05-02).
Section 1 — Wire format
<role>::<caps>::<scope_sha8>::<body_sha8>-<nonce8>
Example:
vm-managed::HZ-CX22-NB::A1B2C3D4::DEADBEEF-c0ffee01
Segment table
| Segment | Separator | Length | Character class | Semantics |
|---|---|---|---|---|
role |
:: prefix |
1+ chars | Any non-empty string | Agent role slug (e.g. vm-managed, code-implementer) |
caps |
:: |
1+ chars | Any non-empty, tags joined by - |
Capability tags (e.g. HZ-CX22-NB, EM) |
scope_sha8 |
:: |
exactly 8 | ASCII hex [0-9A-Fa-f] |
First 4 bytes of SHA-256 of scope input |
body_sha8 |
none | exactly 8 | ASCII hex | First 4 bytes of SHA-256 of body input |
- |
literal - |
1 | - |
Separates body_sha8 from nonce |
nonce8 |
end of string | exactly 8 | ASCII hex [0-9a-f] |
Random 4 bytes, lowercase, per-spawn |
Split rule: four :: segments → parts[0..3]; parts[3] splits on last - into body_sha8 and nonce8.
Total minimum wire length: 1 + 2 + 1 + 2 + 8 + 2 + 8 + 1 + 8 = 33 chars.
Section 2 — Computing scope_sha8
scope_sha8 is the first 8 uppercase hex chars of SHA-256(scope_input).
scope_input is arbitrary bytes representing "what task class is this" — typically a canonical URL, manifest path, or task description. The exact content is caller-defined; the only contract is that the same bytes always yield the same hash.
Worked example (Python)
import hashlib
scope_input = b"keiseikit.dev/vms/hetzner/nbg1"
digest = hashlib.sha256(scope_input).digest()
scope_sha8 = digest[:4].hex().upper() # first 4 bytes → 8 hex chars
print(scope_sha8) # e.g. "3F7A2C11"
Worked example (shell)
echo -n "keiseikit.dev/vms/hetzner/nbg1" \
| sha256sum \
| cut -c1-8 \
| tr 'a-f' 'A-F'
# prints e.g. "3F7A2C11"
Note: sha256sum outputs lowercase; tr uppercases to match Rust's format!("{:02X}").
Section 3 — Computing body_sha8
Identical algorithm to scope_sha8, applied to the body input bytes.
body_input represents "what is the substrate configuration" — typically a JSON manifest body, config struct, or similar content-addressable blob.
body_input = b'{"tier":"cx22","cloud_init_sha":"abc"}'
body_sha8 = hashlib.sha256(body_input).digest()[:4].hex().upper()
Section 4 — Nonce
nonce8 is 4 random bytes formatted as 8 lowercase hex chars.
It is generated fresh on every agent spawn. It is NOT cryptographic — its sole purpose is to distinguish concurrent spawns of the same task class from each other in the ledger.
import secrets
nonce8 = secrets.token_bytes(4).hex() # always lowercase, 8 chars
openssl rand -hex 4 # produces e.g. "c0ffee01"
The Rust source (kei-runtime-core/src/dna.rs::random_hex8_lower) uses rand::thread_rng().fill_bytes.
Section 5 — Parsing in pure shell and Python
Shell (awk one-liner)
DNA="vm-managed::HZ-CX22-NB::A1B2C3D4::DEADBEEF-c0ffee01"
echo "$DNA" | awk -F'::' '{
role=$1; caps=$2; scope_sha=$3
n=split($4, tail, "-")
nonce=tail[n]; body_sha=""
for(i=1;i<n;i++) body_sha=(i==1?tail[i]:body_sha"-"tail[i])
print "role="role, "caps="caps, "scope="scope_sha, "body="body_sha, "nonce="nonce
}'
# role=vm-managed caps=HZ-CX22-NB scope=A1B2C3D4 body=DEADBEEF nonce=c0ffee01
Python (regex)
import re
DNA_RE = re.compile(
r'^(?P<role>[^:]+)::(?P<caps>[^:]+)::(?P<scope_sha>[0-9A-Fa-f]{8})'
r'::(?P<body_sha>[0-9A-Fa-f]{8})-(?P<nonce>[0-9A-Fa-f]{8})$'
)
def parse_dna(s: str) -> dict:
m = DNA_RE.match(s)
if not m:
raise ValueError(f"invalid DNA: {s!r}")
return m.groupdict()
dna = parse_dna("vm-managed::HZ-CX22-NB::A1B2C3D4::DEADBEEF-c0ffee01")
# {'role': 'vm-managed', 'caps': 'HZ-CX22-NB',
# 'scope_sha': 'A1B2C3D4', 'body_sha': 'DEADBEEF', 'nonce': 'c0ffee01'}
Section 6 — Same-task-class property
scope_sha8 and body_sha8 are deterministic: identical inputs always produce the same values. Only the nonce changes per spawn.
This means the task_class_dna (DNA with the -<nonce8> suffix stripped) is a stable per-task-class identifier:
task_class_dna = DNA[: -9] # strip trailing "-xxxxxxxx"
# e.g. "vm-managed::HZ-CX22-NB::A1B2C3D4::DEADBEEF"
The ledger (v9+) stores this as a VIRTUAL generated column task_class_dna for empirical posterior aggregation. Two spawns of the same task class will share the same prefix and differ only in nonce.
-- Find all spawns of a given task class across time:
SELECT id, started_ts, outcome
FROM agents
WHERE task_class_dna = 'vm-managed::HZ-CX22-NB::A1B2C3D4::DEADBEEF'
ORDER BY started_ts DESC;