KeiSeiKit-1.0/_primitives/_rust/kei-memory/src/schema.rs
Parfii-bot 902fb3e81a feat(kei-memory): functional schema fix + 4-wave architecture refactor
Wave A — Functional ingest fix (root cause of empty Sleep reports):
- Rewrote TraceLine struct to match real Claude Code trace JSONL:
  type (was kind), timestamp ISO8601 (was epoch ts), message Object,
  cwd / gitBranch / parentUuid / uuid / subtype / toolUseID / toolUseResult
- New src/extract.rs: extract_tool_uses + extract_tool_result walks
  message.content[] for nested tool_use / tool_result blocks
- New src/classifier.rs: explicit table classifier (tool_error, user_correction,
  retry_loop, permission_denied, tool_use:<name>, ...) replaces shallow heuristic
- New src/error.rs: KeiMemoryError enum (IO/Parse/Db) replaces semantic
  mismatch where IO error was wrapped as rusqlite::InvalidParameterName
- New src/trace_line.rs: TraceLine + helpers (cube extraction)
- Schema migration v3: events.cwd column + 3 hot-query indices
  (events.tool, events.file_path, events.ts) + UNIQUE on patterns
- New tests/ingest_real_trace.rs: synth-fixture asserts tool/file/cwd/class extraction

Wave B — Lib crate split:
- Cargo.toml: [lib] target added alongside existing [[bin]]
- src/lib.rs: pub re-export of all 18 modules
- src/main.rs: 11 mod declarations replaced by single use kei_memory::{…}
- tests/integration.rs: #[path] hack replaced by use kei_memory::{…}

Wave C — TF-IDF dedup + single-JOIN + filter_map fix:
- Schema migration v2: tokens.idf_dirty column + flag-based dedup
- index_document no longer triggers per-call recompute_idf rebuild
- top_similar uses single JOIN via vectors_for_overlapping_sessions helper
  (was N round-trips, one session_vector per candidate)
- All filter_map(|r| r.ok()) row-error swallowing replaced with ? propagation
- New tests/tfidf_idf_dedup.rs: 4 tests covering dedup behaviour, IDF emptiness,
  JOIN-pruning, empty-query safety

Wave D — Commands split + nits:
- New src/dump.rs (43 LOC) + src/stats.rs (33 LOC):
  CLI renderers extracted from commands.rs (was inline SQL + format)
- src/commands.rs: thin wrappers, -42 LOC
- src/injection_guard.rs: inline tests removed (-26 LOC), file under 200 LOC threshold
- tests/injection_guard_unit.rs (new): 4 tests in proper integration crate
- src/patterns.rs: INSERT replaced with INSERT...ON CONFLICT...DO UPDATE
  (idempotent re-ingest, uses Wave A's UNIQUE index)
- src/analyze.rs + src/coaccess.rs: filter_map row-error fixes
- src/coaccess.rs: misleading PK comment rewritten

Verify-before-commit (RULE 0.13 §"Verify-before-commit"):
- cargo check --all-targets: PASS (1 unrelated dead-code warning)
- cargo test: 42 passed, 0 failed across 9 test binaries
- STATUS-TRUTH markers aggregated at .claude/agents/_merge/kei-memory-2026-05-01/

Architect-spotted ARCH-MAJOR + ARCH-MINOR + ARCH-NIT findings addressed:
- ARCH-MAJOR Cargo.toml binary-only (Wave B)
- ARCH-MAJOR schema missing indices (Wave A v3)
- ARCH-MAJOR ingest_jsonl choke point (Wave A — extract.rs + classifier.rs)
- ARCH-MAJOR idf O(N·V) per-call rebuild (Wave C)
- ARCH-MINOR patterns no UPSERT (Wave D)
- ARCH-MINOR commands.rs houses dump+stats (Wave D)
- ARCH-MINOR classifier silent contract (Wave A)
- ARCH-MINOR IO error wrapped as rusqlite (Wave A)
- ARCH-MINOR injection_guard inline tests (Wave D)
- ARCH-MINOR tfidf top_similar N round-trips (Wave C)
- ARCH-NIT 3× filter_map(|r| r.ok()) sites (Wave C + D)
- ARCH-NIT coaccess misleading comment (Wave D)

=== STATUS-TRUTH MARKER ===
shipped: functional
stubs: 0
cargo-check: PASS
cargo-test: PASS (42 tests, 0 failures)
behaviour-verified: yes
follow-up-required:
  - tests/ingest_guard_tests.rs + tests/guard_test_corpus.rs still on #[path] hack (Wave B follow-up note, ~5 LOC)
  - dead_code warning Severity::Warn unused (pre-existing, not blocking)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:10:06 +08:00

95 lines
3.7 KiB
Rust

//! SQL schema for the kei-memory offline analyzer.
//!
//! Constructor Pattern: schema + migration runner, no business logic.
//! DB default path: `~/.claude/memory/kei-memory.sqlite`.
//! Any structural change MUST append a new migration; never edit history.
use rusqlite::{Connection, Result};
/// Ordered migrations. Index = schema version. Never reorder.
pub const MIGRATIONS: &[&str] = &[
// v1 — initial schema (RULE 0.14, 2026-04-22)
"CREATE TABLE IF NOT EXISTS sessions (
id TEXT PRIMARY KEY,
started_ts INTEGER NOT NULL,
ended_ts INTEGER,
tool_call_count INTEGER NOT NULL DEFAULT 0,
error_count INTEGER NOT NULL DEFAULT 0
);
CREATE TABLE IF NOT EXISTS events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
ts INTEGER NOT NULL,
kind TEXT NOT NULL,
tool TEXT,
file_path TEXT,
is_error INTEGER NOT NULL DEFAULT 0,
event_class TEXT,
message TEXT,
FOREIGN KEY(session_id) REFERENCES sessions(id)
);
CREATE INDEX IF NOT EXISTS idx_events_session ON events(session_id);
CREATE INDEX IF NOT EXISTS idx_events_class ON events(event_class);
CREATE TABLE IF NOT EXISTS coaccess (
file_a TEXT NOT NULL,
file_b TEXT NOT NULL,
count INTEGER NOT NULL DEFAULT 1,
PRIMARY KEY(file_a, file_b)
);
CREATE TABLE IF NOT EXISTS tokens (
session_id TEXT NOT NULL,
token TEXT NOT NULL,
tf INTEGER NOT NULL,
PRIMARY KEY(session_id, token)
);
CREATE TABLE IF NOT EXISTS idf (
token TEXT PRIMARY KEY,
df INTEGER NOT NULL,
idf REAL NOT NULL
);
CREATE TABLE IF NOT EXISTS patterns (
id INTEGER PRIMARY KEY AUTOINCREMENT,
event_class TEXT NOT NULL,
session_id TEXT NOT NULL,
count INTEGER NOT NULL,
first_seen_ts INTEGER NOT NULL,
last_seen_ts INTEGER NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_patterns_class ON patterns(event_class);
CREATE TABLE IF NOT EXISTS backlog (
id INTEGER PRIMARY KEY AUTOINCREMENT,
ts INTEGER NOT NULL,
item TEXT NOT NULL,
processed INTEGER NOT NULL DEFAULT 0
);",
// v2 — TF-IDF dedup: mark token rows that need IDF recomputation
// (RULE 0.16 / Wave C, 2026-05-01). Default 1 so existing rows force
// a one-time recompute on first stale-check after upgrade.
"ALTER TABLE tokens ADD COLUMN idf_dirty INTEGER NOT NULL DEFAULT 1;",
// v3 — Wave A schema fix (2026-05-01):
// * `events.cwd` — pulled from real Claude Code trace `cwd` field.
// Lets retrospectives bucket by working directory.
// * Hot-query indices on tool / file_path / ts.
// * UNIQUE index on patterns(event_class, COALESCE(session_id,''))
// enables the UPSERT planned for Wave D pattern persistence.
"ALTER TABLE events ADD COLUMN cwd TEXT;
CREATE INDEX IF NOT EXISTS idx_events_tool ON events(tool) WHERE tool IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_events_file_path ON events(file_path) WHERE file_path IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_events_ts ON events(ts);
CREATE UNIQUE INDEX IF NOT EXISTS idx_patterns_class_session ON patterns(event_class, COALESCE(session_id, ''));",
];
/// Apply all pending migrations. Stores version in `PRAGMA user_version`.
pub fn migrate(conn: &Connection) -> Result<()> {
let current: i64 = conn
.query_row("PRAGMA user_version", [], |r| r.get(0))
.unwrap_or(0);
for (i, sql) in MIGRATIONS.iter().enumerate() {
let target = (i + 1) as i64;
if current < target {
conn.execute_batch(sql)?;
conn.pragma_update(None, "user_version", target)?;
}
}
Ok(())
}