KeiSeiKit-1.0/_primitives/_rust/kei-memory/tests/integration.rs
Parfii-bot 902fb3e81a feat(kei-memory): functional schema fix + 4-wave architecture refactor
Wave A — Functional ingest fix (root cause of empty Sleep reports):
- Rewrote TraceLine struct to match real Claude Code trace JSONL:
  type (was kind), timestamp ISO8601 (was epoch ts), message Object,
  cwd / gitBranch / parentUuid / uuid / subtype / toolUseID / toolUseResult
- New src/extract.rs: extract_tool_uses + extract_tool_result walks
  message.content[] for nested tool_use / tool_result blocks
- New src/classifier.rs: explicit table classifier (tool_error, user_correction,
  retry_loop, permission_denied, tool_use:<name>, ...) replaces shallow heuristic
- New src/error.rs: KeiMemoryError enum (IO/Parse/Db) replaces semantic
  mismatch where IO error was wrapped as rusqlite::InvalidParameterName
- New src/trace_line.rs: TraceLine + helpers (cube extraction)
- Schema migration v3: events.cwd column + 3 hot-query indices
  (events.tool, events.file_path, events.ts) + UNIQUE on patterns
- New tests/ingest_real_trace.rs: synth-fixture asserts tool/file/cwd/class extraction

Wave B — Lib crate split:
- Cargo.toml: [lib] target added alongside existing [[bin]]
- src/lib.rs: pub re-export of all 18 modules
- src/main.rs: 11 mod declarations replaced by single use kei_memory::{…}
- tests/integration.rs: #[path] hack replaced by use kei_memory::{…}

Wave C — TF-IDF dedup + single-JOIN + filter_map fix:
- Schema migration v2: tokens.idf_dirty column + flag-based dedup
- index_document no longer triggers per-call recompute_idf rebuild
- top_similar uses single JOIN via vectors_for_overlapping_sessions helper
  (was N round-trips, one session_vector per candidate)
- All filter_map(|r| r.ok()) row-error swallowing replaced with ? propagation
- New tests/tfidf_idf_dedup.rs: 4 tests covering dedup behaviour, IDF emptiness,
  JOIN-pruning, empty-query safety

Wave D — Commands split + nits:
- New src/dump.rs (43 LOC) + src/stats.rs (33 LOC):
  CLI renderers extracted from commands.rs (was inline SQL + format)
- src/commands.rs: thin wrappers, -42 LOC
- src/injection_guard.rs: inline tests removed (-26 LOC), file under 200 LOC threshold
- tests/injection_guard_unit.rs (new): 4 tests in proper integration crate
- src/patterns.rs: INSERT replaced with INSERT...ON CONFLICT...DO UPDATE
  (idempotent re-ingest, uses Wave A's UNIQUE index)
- src/analyze.rs + src/coaccess.rs: filter_map row-error fixes
- src/coaccess.rs: misleading PK comment rewritten

Verify-before-commit (RULE 0.13 §"Verify-before-commit"):
- cargo check --all-targets: PASS (1 unrelated dead-code warning)
- cargo test: 42 passed, 0 failed across 9 test binaries
- STATUS-TRUTH markers aggregated at .claude/agents/_merge/kei-memory-2026-05-01/

Architect-spotted ARCH-MAJOR + ARCH-MINOR + ARCH-NIT findings addressed:
- ARCH-MAJOR Cargo.toml binary-only (Wave B)
- ARCH-MAJOR schema missing indices (Wave A v3)
- ARCH-MAJOR ingest_jsonl choke point (Wave A — extract.rs + classifier.rs)
- ARCH-MAJOR idf O(N·V) per-call rebuild (Wave C)
- ARCH-MINOR patterns no UPSERT (Wave D)
- ARCH-MINOR commands.rs houses dump+stats (Wave D)
- ARCH-MINOR classifier silent contract (Wave A)
- ARCH-MINOR IO error wrapped as rusqlite (Wave A)
- ARCH-MINOR injection_guard inline tests (Wave D)
- ARCH-MINOR tfidf top_similar N round-trips (Wave C)
- ARCH-NIT 3× filter_map(|r| r.ok()) sites (Wave C + D)
- ARCH-NIT coaccess misleading comment (Wave D)

=== STATUS-TRUTH MARKER ===
shipped: functional
stubs: 0
cargo-check: PASS
cargo-test: PASS (42 tests, 0 failures)
behaviour-verified: yes
follow-up-required:
  - tests/ingest_guard_tests.rs + tests/guard_test_corpus.rs still on #[path] hack (Wave B follow-up note, ~5 LOC)
  - dead_code warning Severity::Warn unused (pre-existing, not blocking)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:10:06 +08:00

172 lines
6.6 KiB
Rust

//! Integration tests for kei-memory.
//!
//! Constructor Pattern: each test = one scenario, one assertion target.
//! Uses tempfile for per-test isolated sqlite file. Imports the
//! library crate directly (kei-memory now exposes [lib] + [bin]).
use kei_memory::{analyze, coaccess, ingest, patterns, schema, similarity, tfidf};
use rusqlite::Connection;
use std::fs;
use std::io::Write;
use std::path::PathBuf;
use tempfile::TempDir;
fn open_tmp() -> (TempDir, Connection) {
let dir = tempfile::tempdir().unwrap();
let db_path = dir.path().join("kei-memory.sqlite");
let conn = Connection::open(&db_path).unwrap();
schema::migrate(&conn).unwrap();
(dir, conn)
}
fn write_jsonl(dir: &TempDir, name: &str, lines: &[&str]) -> PathBuf {
let p = dir.path().join(name);
let mut f = fs::File::create(&p).unwrap();
for l in lines {
writeln!(f, "{l}").unwrap();
}
p
}
#[test]
fn ingest_then_analyze_roundtrip() {
let (d, conn) = open_tmp();
let trace = write_jsonl(&d, "s1.jsonl", &[
r#"{"ts":1700000000,"kind":"tool_use","tool":"Bash","message":"ok"}"#,
r#"{"ts":1700000010,"kind":"tool_use","tool":"Edit","file_path":"/a.rs"}"#,
r#"{"ts":1700000020,"kind":"tool_use","tool":"Bash","is_error":true,"message":"permission denied"}"#,
]);
let n = ingest::ingest_jsonl(&conn, "s1", &trace).unwrap();
assert_eq!(n, 3);
let hdr = analyze::session_header(&conn, "s1").unwrap().unwrap();
assert_eq!(hdr.tool_call_count, 3);
assert_eq!(hdr.error_count, 1);
let report = analyze::render_report(&conn, "s1", false).unwrap();
assert!(report.contains("Tool calls: 3"));
assert!(report.contains("Errors: 1"));
}
#[test]
fn coaccess_counts_pair_within_window() {
let (d, conn) = open_tmp();
let trace = write_jsonl(&d, "s2.jsonl", &[
r#"{"ts":1700000000,"kind":"tool_use","tool":"Edit","file_path":"/a.rs"}"#,
r#"{"ts":1700000060,"kind":"tool_use","tool":"Edit","file_path":"/b.rs"}"#,
r#"{"ts":1700000120,"kind":"tool_use","tool":"Edit","file_path":"/a.rs"}"#,
]);
ingest::ingest_jsonl(&conn, "s2", &trace).unwrap();
let pairs = coaccess::top_pairs(&conn, 10).unwrap();
assert!(!pairs.is_empty());
let hit = pairs.iter().find(|(a, b, _)| {
(a == "/a.rs" && b == "/b.rs") || (a == "/b.rs" && b == "/a.rs")
});
assert!(hit.is_some(), "expected pair (/a.rs,/b.rs), got {pairs:?}");
assert!(hit.unwrap().2 >= 1);
}
#[test]
fn tfidf_similarity_between_known_docs() {
let (_d, conn) = open_tmp();
tfidf::index_document(&conn, "sA", "rust cargo workspace conflict build error").unwrap();
tfidf::index_document(&conn, "sB", "rust cargo workspace conflict ci").unwrap();
tfidf::index_document(&conn, "sC", "swift xcode simulator audio").unwrap();
let top = tfidf::top_similar(&conn, "rust cargo workspace", 3).unwrap();
assert!(!top.is_empty());
let best = &top[0].0;
assert!(best == "sA" || best == "sB", "expected sA or sB first, got {best}");
let worst = top.iter().find(|(id, _)| id == "sC");
if let Some((_, s)) = worst {
assert!(*s <= top[0].1, "unrelated doc should not outrank target");
}
}
#[test]
fn pattern_detection_finds_recurring_class() {
let (d, conn) = open_tmp();
let trace = write_jsonl(&d, "s3.jsonl", &[
r#"{"ts":1700000000,"kind":"tool_use","tool":"Bash","event_class":"worktree_denied","is_error":true}"#,
r#"{"ts":1700000010,"kind":"tool_use","tool":"Bash","event_class":"worktree_denied","is_error":true}"#,
r#"{"ts":1700000020,"kind":"tool_use","tool":"Bash","event_class":"worktree_denied","is_error":true}"#,
r#"{"ts":1700000030,"kind":"tool_use","tool":"Read","event_class":"read_ok"}"#,
]);
ingest::ingest_jsonl(&conn, "s3", &trace).unwrap();
let hits = patterns::detect_in_session(&conn, "s3").unwrap();
let wd = hits.iter().find(|h| h.event_class == "worktree_denied");
assert!(wd.is_some(), "expected worktree_denied pattern");
assert_eq!(wd.unwrap().count, 3);
}
#[test]
fn stats_counts_sessions_and_events() {
let (d, conn) = open_tmp();
let t1 = write_jsonl(&d, "a.jsonl", &[
r#"{"ts":1,"kind":"tool_use","tool":"Bash"}"#,
r#"{"ts":2,"kind":"tool_use","tool":"Edit"}"#,
]);
let t2 = write_jsonl(&d, "b.jsonl", &[
r#"{"ts":3,"kind":"tool_use","tool":"Grep"}"#,
]);
ingest::ingest_jsonl(&conn, "a", &t1).unwrap();
ingest::ingest_jsonl(&conn, "b", &t2).unwrap();
let n_sess: i64 = conn.query_row("SELECT COUNT(*) FROM sessions", [], |r| r.get(0)).unwrap();
let n_evt: i64 = conn.query_row("SELECT COUNT(*) FROM events", [], |r| r.get(0)).unwrap();
assert_eq!(n_sess, 2);
assert_eq!(n_evt, 3);
}
#[test]
fn backlog_crud_add_list_clear() {
let (_d, conn) = open_tmp();
let now = 1700000000i64;
conn.execute(
"INSERT INTO backlog (ts, item) VALUES (?1, ?2)",
rusqlite::params![now, "item-one"],
).unwrap();
conn.execute(
"INSERT INTO backlog (ts, item) VALUES (?1, ?2)",
rusqlite::params![now + 1, "item-two"],
).unwrap();
let open_ct: i64 = conn.query_row(
"SELECT COUNT(*) FROM backlog WHERE processed = 0", [], |r| r.get(0),
).unwrap();
assert_eq!(open_ct, 2);
conn.execute("UPDATE backlog SET processed = 1", []).unwrap();
let after: i64 = conn.query_row(
"SELECT COUNT(*) FROM backlog WHERE processed = 0", [], |r| r.get(0),
).unwrap();
assert_eq!(after, 0);
}
#[test]
fn cross_session_pattern_needs_two_sessions() {
let (d, conn) = open_tmp();
let a = write_jsonl(&d, "a.jsonl", &[
r#"{"ts":1,"kind":"tool_use","event_class":"foo"}"#,
]);
let b = write_jsonl(&d, "b.jsonl", &[
r#"{"ts":2,"kind":"tool_use","event_class":"foo"}"#,
]);
ingest::ingest_jsonl(&conn, "a", &a).unwrap();
ingest::ingest_jsonl(&conn, "b", &b).unwrap();
let cross = patterns::detect_cross_session(&conn).unwrap();
let foo = cross.iter().find(|p| p.event_class == "foo");
assert!(foo.is_some());
assert_eq!(foo.unwrap().count, 2);
}
#[test]
fn cosine_similarity_sanity() {
let mut a = std::collections::HashMap::new();
a.insert("rust".to_string(), 1.0);
a.insert("cargo".to_string(), 1.0);
let mut b = std::collections::HashMap::new();
b.insert("rust".to_string(), 1.0);
b.insert("cargo".to_string(), 1.0);
let s_ident = similarity::cosine_tfidf(&a, &b);
assert!((s_ident - 1.0).abs() < 1e-9);
let mut c = std::collections::HashMap::new();
c.insert("swift".to_string(), 1.0);
let s_ortho = similarity::cosine_tfidf(&a, &c);
assert!(s_ortho.abs() < 1e-9);
}