KeiSeiKit-1.0/_primitives/_rust/kei-memory/src/analyze.rs
Parfii-bot eedffd1cd2 feat(kei-memory): functional schema fix + 4-wave architecture refactor
Wave A — Functional ingest fix (root cause of empty Sleep reports):
- Rewrote TraceLine struct to match real Claude Code trace JSONL:
  type (was kind), timestamp ISO8601 (was epoch ts), message Object,
  cwd / gitBranch / parentUuid / uuid / subtype / toolUseID / toolUseResult
- New src/extract.rs: extract_tool_uses + extract_tool_result walks
  message.content[] for nested tool_use / tool_result blocks
- New src/classifier.rs: explicit table classifier (tool_error, user_correction,
  retry_loop, permission_denied, tool_use:<name>, ...) replaces shallow heuristic
- New src/error.rs: KeiMemoryError enum (IO/Parse/Db) replaces semantic
  mismatch where IO error was wrapped as rusqlite::InvalidParameterName
- New src/trace_line.rs: TraceLine + helpers (cube extraction)
- Schema migration v3: events.cwd column + 3 hot-query indices
  (events.tool, events.file_path, events.ts) + UNIQUE on patterns
- New tests/ingest_real_trace.rs: synth-fixture asserts tool/file/cwd/class extraction

Wave B — Lib crate split:
- Cargo.toml: [lib] target added alongside existing [[bin]]
- src/lib.rs: pub re-export of all 18 modules
- src/main.rs: 11 mod declarations replaced by single use kei_memory::{…}
- tests/integration.rs: #[path] hack replaced by use kei_memory::{…}

Wave C — TF-IDF dedup + single-JOIN + filter_map fix:
- Schema migration v2: tokens.idf_dirty column + flag-based dedup
- index_document no longer triggers per-call recompute_idf rebuild
- top_similar uses single JOIN via vectors_for_overlapping_sessions helper
  (was N round-trips, one session_vector per candidate)
- All filter_map(|r| r.ok()) row-error swallowing replaced with ? propagation
- New tests/tfidf_idf_dedup.rs: 4 tests covering dedup behaviour, IDF emptiness,
  JOIN-pruning, empty-query safety

Wave D — Commands split + nits:
- New src/dump.rs (43 LOC) + src/stats.rs (33 LOC):
  CLI renderers extracted from commands.rs (was inline SQL + format)
- src/commands.rs: thin wrappers, -42 LOC
- src/injection_guard.rs: inline tests removed (-26 LOC), file under 200 LOC threshold
- tests/injection_guard_unit.rs (new): 4 tests in proper integration crate
- src/patterns.rs: INSERT replaced with INSERT...ON CONFLICT...DO UPDATE
  (idempotent re-ingest, uses Wave A's UNIQUE index)
- src/analyze.rs + src/coaccess.rs: filter_map row-error fixes
- src/coaccess.rs: misleading PK comment rewritten

Verify-before-commit (RULE 0.13 §"Verify-before-commit"):
- cargo check --all-targets: PASS (1 unrelated dead-code warning)
- cargo test: 42 passed, 0 failed across 9 test binaries
- STATUS-TRUTH markers aggregated at .claude/agents/_merge/kei-memory-2026-05-01/

Architect-spotted ARCH-MAJOR + ARCH-MINOR + ARCH-NIT findings addressed:
- ARCH-MAJOR Cargo.toml binary-only (Wave B)
- ARCH-MAJOR schema missing indices (Wave A v3)
- ARCH-MAJOR ingest_jsonl choke point (Wave A — extract.rs + classifier.rs)
- ARCH-MAJOR idf O(N·V) per-call rebuild (Wave C)
- ARCH-MINOR patterns no UPSERT (Wave D)
- ARCH-MINOR commands.rs houses dump+stats (Wave D)
- ARCH-MINOR classifier silent contract (Wave A)
- ARCH-MINOR IO error wrapped as rusqlite (Wave A)
- ARCH-MINOR injection_guard inline tests (Wave D)
- ARCH-MINOR tfidf top_similar N round-trips (Wave C)
- ARCH-NIT 3× filter_map(|r| r.ok()) sites (Wave C + D)
- ARCH-NIT coaccess misleading comment (Wave D)

=== STATUS-TRUTH MARKER ===
shipped: functional
stubs: 0
cargo-check: PASS
cargo-test: PASS (42 tests, 0 failures)
behaviour-verified: yes
follow-up-required:
  - tests/ingest_guard_tests.rs + tests/guard_test_corpus.rs still on #[path] hack (Wave B follow-up note, ~5 LOC)
  - dead_code warning Severity::Warn unused (pre-existing, not blocking)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:10:06 +08:00

121 lines
4.3 KiB
Rust

//! Session retrospective — duration, tool counts, files, errors, time-wasters.
//!
//! Constructor Pattern: one cube, one read-only responsibility.
//! Output is plain-text (stdout). Callers can `--summary` for a one-liner
//! suitable for appending to audit-backlog.md, or full report for review.
use rusqlite::{params, Connection, OptionalExtension, Result};
/// Minimal session-header info returned as tuple for downstream formatters.
pub struct SessionHeader {
pub id: String,
pub started_ts: i64,
pub ended_ts: Option<i64>,
pub tool_call_count: i64,
pub error_count: i64,
}
/// Load the `sessions` row for an id.
pub fn session_header(conn: &Connection, id: &str) -> Result<Option<SessionHeader>> {
conn.query_row(
"SELECT id, started_ts, ended_ts, tool_call_count, error_count
FROM sessions WHERE id = ?1",
params![id],
|r| {
Ok(SessionHeader {
id: r.get(0)?,
started_ts: r.get(1)?,
ended_ts: r.get(2)?,
tool_call_count: r.get(3)?,
error_count: r.get(4)?,
})
},
)
.optional()
}
/// Return the last `n` session ids (most recent first).
pub fn recent_session_ids(conn: &Connection, n: usize) -> Result<Vec<String>> {
let mut stmt = conn.prepare(
"SELECT id FROM sessions ORDER BY COALESCE(ended_ts, started_ts) DESC LIMIT ?1",
)?;
let rows = stmt
.query_map(params![n as i64], |r| r.get::<_, String>(0))?
.collect::<Result<Vec<_>>>()?;
Ok(rows)
}
/// Return (tool, count) pairs ordered by invocation count DESC.
pub fn top_tools(conn: &Connection, session_id: &str, limit: usize) -> Result<Vec<(String, i64)>> {
let mut stmt = conn.prepare(
"SELECT tool, COUNT(*) FROM events
WHERE session_id = ?1 AND tool IS NOT NULL
GROUP BY tool ORDER BY COUNT(*) DESC LIMIT ?2",
)?;
let rows = stmt
.query_map(params![session_id, limit as i64], |r| {
Ok((r.get::<_, String>(0)?, r.get::<_, i64>(1)?))
})?
.collect::<Result<Vec<_>>>()?;
Ok(rows)
}
/// Return (file_path, count) for the most-touched files in a session.
pub fn top_files(conn: &Connection, session_id: &str, limit: usize) -> Result<Vec<(String, i64)>> {
let mut stmt = conn.prepare(
"SELECT file_path, COUNT(*) FROM events
WHERE session_id = ?1 AND file_path IS NOT NULL
GROUP BY file_path ORDER BY COUNT(*) DESC LIMIT ?2",
)?;
let rows = stmt
.query_map(params![session_id, limit as i64], |r| {
Ok((r.get::<_, String>(0)?, r.get::<_, i64>(1)?))
})?
.collect::<Result<Vec<_>>>()?;
Ok(rows)
}
/// Render a full retrospective for one session to stdout.
pub fn render_report(conn: &Connection, session_id: &str, summary_only: bool) -> Result<String> {
let hdr = match session_header(conn, session_id)? {
Some(h) => h,
None => return Ok(format!("(no session with id {session_id})\n")),
};
let duration = hdr.ended_ts.unwrap_or(hdr.started_ts) - hdr.started_ts;
if summary_only {
return Ok(format!(
"session={} dur={}s tools={} errors={}\n",
hdr.id, duration, hdr.tool_call_count, hdr.error_count
));
}
let mut out = String::new();
out.push_str(&format!("=== SESSION {} ===\n", hdr.id));
out.push_str(&format!("Duration: {}s\n", duration));
out.push_str(&format!("Tool calls: {}\n", hdr.tool_call_count));
out.push_str(&format!("Errors: {}\n", hdr.error_count));
out.push_str("\nTop tools:\n");
for (t, c) in top_tools(conn, session_id, 5)? {
out.push_str(&format!(" {c:>4} {t}\n"));
}
out.push_str("\nTop files:\n");
for (f, c) in top_files(conn, session_id, 10)? {
out.push_str(&format!(" {c:>4} {f}\n"));
}
Ok(out)
}
/// Aggregate analyze across recent N sessions — concat render_report each.
pub fn render_recent(conn: &Connection, n: usize, summary_only: bool) -> Result<String> {
let ids = recent_session_ids(conn, n)?;
if ids.is_empty() {
return Ok("(no sessions ingested yet)\n".into());
}
let mut out = String::new();
for id in ids {
out.push_str(&render_report(conn, &id, summary_only)?);
if !summary_only {
out.push('\n');
}
}
Ok(out)
}