KeiSeiKit-1.0/_assembler/src/registry_client.rs
Parfii-bot f135ece1ca feat(path-atoms): atomize ~/.claude memory + rules path references
Phase 1 of substrate-unified-registry: move all references to user
home memory/rules out of plain strings and into content-addressable
path atoms. Public artefacts now contain opaque `{path::NAME}/file.md`
references; the actual home prefix lives only in the path-atom file's
frontmatter, registered in the local kei-registry.

NEW path atoms (`_blocks/path-*.md`):
- `path-user-memory.md` → template `~/.claude/memory`
- `path-user-rules.md`  → template `~/.claude/rules`

Both files use frontmatter `type: atom, kind: path, template: ..., expand_at: render`.
BlockMdScanner auto-registers them; DNA index shows them under their
unprefixed names (`user-memory`, `user-rules`) for human lookup, while
the body sha8 makes them content-addressable.

Resolver (`_assembler/src/registry_client.rs`):
- `is_path_atom(conn, name)` — checks DB by name + filename convention
  (`_blocks/path-<name>.md`) + frontmatter `kind: path`. Defensive:
  filename + frontmatter must BOTH agree.
- `frontmatter_has_kind_path(body)` — minimal YAML parser. Tolerates
  CRLF, quoted values, rejects substring matches (`pathological` ≠ `path`).
- 5 unit tests cover positive + 4 negative cases.

Resolver wire-up (`_assembler/src/assembler.rs:147 write_references`):
- For each `references.extra` entry starting with `path:NAME/...`:
  - Lookup `NAME` via `is_path_atom`.
  - On success: emit `{path::NAME}/<suffix>` — opaque, kit-resolvable.
  - On miss: stderr warn + passthrough. Never fatal.
- Non-`path:` refs pass through unchanged. Backward compatible.
- 2 unit tests cover passthrough paths.

Manifest migration (38 manifests touched):
- `~/.claude/rules/<file>` → `path:user-rules/<file>`
- `~/.claude/memory/<file>` → `path:user-memory/<file>`
- 96 references migrated; 1 prose-style reference in security-auditor
  left as plain text (lives inside a domain_in description, not in
  references.extra — out of scope for this resolver).

Regenerated 38 `_generated/*.md` + 1 new `frontend-validator.md`.
Regenerated `docs/DNA-INDEX.md` (now includes 2 path-atoms by name).

Verification (cited):
- `git ls-files | grep denisparfionovich` → 0 hits outside allowlist
  (NOTICE/README byline + `.github/workflows/leak-check.yml` detection
  rule).
- `_generated/` contains 99 occurrences of `{path::user-...}/`.
- assembler tests: 29 passed (5 new). kei-registry tests: 10 passed
  (8 short_path from earlier commit + 2 unrelated).
- assembler resolver verified end-to-end: ml-implementer.md line
  479-485 shows `{path::user-rules}/ml-protocol.md` etc.

What this does NOT do (deferred):
- No registry-DB schema change. Path atoms ride existing Atom block-
  type via convention, not via new `BlockType::PathAtom` variant.
- No git-branch tracking (Phase 2 of plan).
- No `kei-registry status` cross-cutting CLI (Phase 3 of plan).
- No path-atom orphan detection CLI (Phase 4).

The path:user-memory and path:user-rules cover 100% of the username-
leak surface from the current manifest set; future categories
(kit-root, registry-db, sync-repo, secrets-env, project-root) can
land additively without architectural changes.

=== STATUS-TRUTH MARKER ===
shipped: functional
stubs: 0
cargo-check: PASS
behaviour-verified: yes
follow-up-required:
  - Phase 2 (git-branch tracker hook)
  - Phase 3 (kei-registry status subcommand)
  - Phase 4 (orphan detection CLI)
  - Sync user-side install: ~/.claude/agents/_manifests/ still has
    pre-migration absolute paths; will pick up new format on next
    `install.sh --add` (out of scope for this commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 22:29:50 +08:00

193 lines
6.9 KiB
Rust

//! Thin read-only client over `~/.claude/registry.sqlite`.
//!
//! Fetches rule-fragment content by logical name (`rule::section`).
//! The registry stores the real filesystem path; this module reads that path.
//!
//! Constructor Pattern: one responsibility — lookup + read fragment body.
//! No writes. No schema migration. Opens DB read-only.
use rusqlite::{Connection, OpenFlags};
use std::path::{Path, PathBuf};
/// Open the registry at `db_path` in read-only mode.
pub fn open_read_only(db_path: &Path) -> Result<Connection, String> {
Connection::open_with_flags(db_path, OpenFlags::SQLITE_OPEN_READ_ONLY)
.map_err(|e| format!("open registry {}: {e}", db_path.display()))
}
/// Default path: `$KEI_REGISTRY_DB` (if set) or `~/.claude/registry.sqlite`.
pub fn default_db_path() -> PathBuf {
if let Some(v) = std::env::var_os("KEI_REGISTRY_DB") {
return PathBuf::from(v);
}
let home = std::env::var_os("HOME").unwrap_or_default();
PathBuf::from(home).join(".claude/registry.sqlite")
}
/// Look up a rule fragment by `name` (e.g. `"karpathy-behavioral::1-think-before-coding"`).
///
/// Returns:
/// - `Ok(Some(body))` — fragment found and file readable.
/// - `Ok(None)` — name not in registry, or registry path does not exist on disk.
/// Caller should warn-and-skip.
/// - `Err(msg)` — DB query failure (not a missing-path issue). Propagate.
pub fn find_rule(conn: &Connection, name: &str) -> Result<Option<String>, String> {
let path = match query_path(conn, name)? {
Some(p) => p,
None => return Ok(None),
};
read_fragment_body(name, &path)
}
/// Query the `path` column for the active row with `name` and `block_type='rule'`.
fn query_path(conn: &Connection, name: &str) -> Result<Option<String>, String> {
let mut stmt = conn
.prepare(
"SELECT path FROM blocks \
WHERE name = ?1 AND block_type = 'rule' AND superseded_by IS NULL \
LIMIT 1",
)
.map_err(|e| format!("prepare query for {name}: {e}"))?;
let row: Option<String> = stmt
.query_row(rusqlite::params![name], |r| r.get(0))
.optional()
.map_err(|e| format!("query registry for {name}: {e}"))?;
Ok(row)
}
/// Read the fragment body from `path`. Returns `Ok(None)` when the file is absent.
fn read_fragment_body(name: &str, path: &str) -> Result<Option<String>, String> {
match std::fs::read_to_string(path) {
Ok(body) => Ok(Some(body)),
Err(e) if e.kind() == std::io::ErrorKind::NotFound => {
eprintln!(
"warn [assembler]: registry fragment for '{name}' has path '{path}' but file is missing — skipping. \
Run `kei-decompose decompose-rules --rebuild-fragments` to restore."
);
Ok(None)
}
Err(e) => Err(format!("read fragment for {name} at {path}: {e}")),
}
}
trait OptionalExt<T>: Sized {
fn optional(self) -> rusqlite::Result<Option<T>>;
}
impl<T> OptionalExt<T> for rusqlite::Result<T> {
fn optional(self) -> rusqlite::Result<Option<T>> {
match self {
Ok(v) => Ok(Some(v)),
Err(rusqlite::Error::QueryReturnedNoRows) => Ok(None),
Err(e) => Err(e),
}
}
}
/// Check if `name` is a registered path-atom.
///
/// Convention: a path-atom is an atom whose source file is
/// `_blocks/path-<name>.md` and whose YAML frontmatter declares
/// `kind: path`. The DB stores only the file path (not body), so this
/// function uses the filename convention as a fast first check, then
/// reads the file and parses the frontmatter to confirm `kind: path`.
///
/// Returns:
/// - `Ok(true)` — atom registered under `name`, file exists, frontmatter
/// declares `kind: path`. Caller may emit an opaque resolved reference.
/// - `Ok(false)` — atom not found, or found but not a path-atom. Caller
/// should pass the original reference through unchanged (with optional
/// warn-and-skip in caller).
/// - `Err(msg)` — DB query failure. Propagate.
pub fn is_path_atom(conn: &Connection, name: &str) -> Result<bool, String> {
let mut stmt = conn
.prepare(
"SELECT path FROM blocks \
WHERE name = ?1 AND block_type = 'atom' AND superseded_by IS NULL \
LIMIT 1",
)
.map_err(|e| format!("prepare path-atom query for {name}: {e}"))?;
let path: Option<String> = stmt
.query_row(rusqlite::params![name], |r| r.get(0))
.optional()
.map_err(|e| format!("query path-atom {name}: {e}"))?;
let Some(p) = path else { return Ok(false) };
// Filename convention check: `_blocks/path-<name>.md`. Cheap O(1) string
// contains, avoids the file read on the common non-path-atom case.
let expected_suffix = format!("/_blocks/path-{name}.md");
if !p.ends_with(&expected_suffix) {
return Ok(false);
}
// Read frontmatter to confirm `kind: path`. Defensive — convention is
// not authoritative on its own; explicit declaration is.
let body = match std::fs::read_to_string(&p) {
Ok(b) => b,
Err(_) => return Ok(false),
};
Ok(frontmatter_has_kind_path(&body))
}
/// Return true if `body` starts with a YAML frontmatter block (`---\n...---\n`)
/// containing a line whose key is `kind` and value is `path`. Tolerates
/// `---\r\n`, surrounding whitespace, and YAML quoting.
fn frontmatter_has_kind_path(body: &str) -> bool {
let stripped = match body
.strip_prefix("---\n")
.or_else(|| body.strip_prefix("---\r\n"))
{
Some(s) => s,
None => return false,
};
let end = match stripped
.find("\n---\n")
.or_else(|| stripped.find("\r\n---\r\n"))
{
Some(i) => i,
None => return false,
};
let frontmatter = &stripped[..end];
for line in frontmatter.lines() {
let line = line.trim();
if let Some(rest) = line.strip_prefix("kind:") {
let val = rest.trim().trim_matches(&['\'', '"'][..]);
return val == "path";
}
}
false
}
#[cfg(test)]
mod tests {
use super::frontmatter_has_kind_path;
#[test]
fn detects_kind_path_in_frontmatter() {
let body = "---\ntype: atom\nkind: path\nname: foo\n---\n\n# body\n";
assert!(frontmatter_has_kind_path(body));
}
#[test]
fn rejects_kind_other() {
let body = "---\ntype: atom\nkind: other\n---\n";
assert!(!frontmatter_has_kind_path(body));
}
#[test]
fn rejects_no_frontmatter() {
let body = "# just markdown\n";
assert!(!frontmatter_has_kind_path(body));
}
#[test]
fn tolerates_quoted_value() {
let body = "---\nkind: \"path\"\n---\n";
assert!(frontmatter_has_kind_path(body));
}
#[test]
fn rejects_kind_path_substring() {
// `kind: pathological` must NOT match `kind: path`.
let body = "---\nkind: pathological\n---\n";
assert!(!frontmatter_has_kind_path(body));
}
}