KeiSeiKit-1.0/_primitives/_rust/kei-memory/src/trace_line.rs
Parfii-bot 902fb3e81a feat(kei-memory): functional schema fix + 4-wave architecture refactor
Wave A — Functional ingest fix (root cause of empty Sleep reports):
- Rewrote TraceLine struct to match real Claude Code trace JSONL:
  type (was kind), timestamp ISO8601 (was epoch ts), message Object,
  cwd / gitBranch / parentUuid / uuid / subtype / toolUseID / toolUseResult
- New src/extract.rs: extract_tool_uses + extract_tool_result walks
  message.content[] for nested tool_use / tool_result blocks
- New src/classifier.rs: explicit table classifier (tool_error, user_correction,
  retry_loop, permission_denied, tool_use:<name>, ...) replaces shallow heuristic
- New src/error.rs: KeiMemoryError enum (IO/Parse/Db) replaces semantic
  mismatch where IO error was wrapped as rusqlite::InvalidParameterName
- New src/trace_line.rs: TraceLine + helpers (cube extraction)
- Schema migration v3: events.cwd column + 3 hot-query indices
  (events.tool, events.file_path, events.ts) + UNIQUE on patterns
- New tests/ingest_real_trace.rs: synth-fixture asserts tool/file/cwd/class extraction

Wave B — Lib crate split:
- Cargo.toml: [lib] target added alongside existing [[bin]]
- src/lib.rs: pub re-export of all 18 modules
- src/main.rs: 11 mod declarations replaced by single use kei_memory::{…}
- tests/integration.rs: #[path] hack replaced by use kei_memory::{…}

Wave C — TF-IDF dedup + single-JOIN + filter_map fix:
- Schema migration v2: tokens.idf_dirty column + flag-based dedup
- index_document no longer triggers per-call recompute_idf rebuild
- top_similar uses single JOIN via vectors_for_overlapping_sessions helper
  (was N round-trips, one session_vector per candidate)
- All filter_map(|r| r.ok()) row-error swallowing replaced with ? propagation
- New tests/tfidf_idf_dedup.rs: 4 tests covering dedup behaviour, IDF emptiness,
  JOIN-pruning, empty-query safety

Wave D — Commands split + nits:
- New src/dump.rs (43 LOC) + src/stats.rs (33 LOC):
  CLI renderers extracted from commands.rs (was inline SQL + format)
- src/commands.rs: thin wrappers, -42 LOC
- src/injection_guard.rs: inline tests removed (-26 LOC), file under 200 LOC threshold
- tests/injection_guard_unit.rs (new): 4 tests in proper integration crate
- src/patterns.rs: INSERT replaced with INSERT...ON CONFLICT...DO UPDATE
  (idempotent re-ingest, uses Wave A's UNIQUE index)
- src/analyze.rs + src/coaccess.rs: filter_map row-error fixes
- src/coaccess.rs: misleading PK comment rewritten

Verify-before-commit (RULE 0.13 §"Verify-before-commit"):
- cargo check --all-targets: PASS (1 unrelated dead-code warning)
- cargo test: 42 passed, 0 failed across 9 test binaries
- STATUS-TRUTH markers aggregated at .claude/agents/_merge/kei-memory-2026-05-01/

Architect-spotted ARCH-MAJOR + ARCH-MINOR + ARCH-NIT findings addressed:
- ARCH-MAJOR Cargo.toml binary-only (Wave B)
- ARCH-MAJOR schema missing indices (Wave A v3)
- ARCH-MAJOR ingest_jsonl choke point (Wave A — extract.rs + classifier.rs)
- ARCH-MAJOR idf O(N·V) per-call rebuild (Wave C)
- ARCH-MINOR patterns no UPSERT (Wave D)
- ARCH-MINOR commands.rs houses dump+stats (Wave D)
- ARCH-MINOR classifier silent contract (Wave A)
- ARCH-MINOR IO error wrapped as rusqlite (Wave A)
- ARCH-MINOR injection_guard inline tests (Wave D)
- ARCH-MINOR tfidf top_similar N round-trips (Wave C)
- ARCH-NIT 3× filter_map(|r| r.ok()) sites (Wave C + D)
- ARCH-NIT coaccess misleading comment (Wave D)

=== STATUS-TRUTH MARKER ===
shipped: functional
stubs: 0
cargo-check: PASS
cargo-test: PASS (42 tests, 0 failures)
behaviour-verified: yes
follow-up-required:
  - tests/ingest_guard_tests.rs + tests/guard_test_corpus.rs still on #[path] hack (Wave B follow-up note, ~5 LOC)
  - dead_code warning Severity::Warn unused (pre-existing, not blocking)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:10:06 +08:00

132 lines
4.4 KiB
Rust

//! TraceLine — superset of real-trace + legacy-flat trace fields.
//!
//! Constructor Pattern: this cube only declares the deserialised line
//! plus tiny helpers (text extraction, ts resolution). Decoding is
//! `serde_json` driven; persistence + classification live elsewhere.
//!
//! Real Claude Code trace shape (sample 51a176c0-*.jsonl, 2026-04-30):
//! {"type": "assistant" | "user" | ..., "timestamp": "<rfc3339>",
//! "sessionId": "...", "cwd": "...", "gitBranch": "...",
//! "uuid": "...", "parentUuid": "...",
//! "message": {"role": "...", "content": [...]}}
//!
//! Legacy KeiSeiKit flat shape (still supported for back-compat tests):
//! {"ts": 1700000000, "kind": "tool_use", "tool": "Bash",
//! "file_path": "...", "is_error": false, "message": "..."}
use crate::extract::parse_timestamp_to_epoch;
use chrono::Utc;
use serde::Deserialize;
use serde_json::Value;
#[derive(Debug, Deserialize, Default)]
pub struct TraceLine {
// ----- real Claude Code trace -----
#[serde(rename = "type", default)]
pub kind: Option<String>,
#[serde(default)]
pub timestamp: Option<String>,
#[serde(rename = "sessionId", default)]
pub session_id: Option<String>,
#[serde(default)]
pub cwd: Option<String>,
#[serde(rename = "gitBranch", default)]
pub git_branch: Option<String>,
#[serde(rename = "parentUuid", default)]
pub parent_uuid: Option<String>,
#[serde(default)]
pub uuid: Option<String>,
#[serde(default)]
pub subtype: Option<String>,
#[serde(default)]
pub message: Option<Value>,
#[serde(rename = "toolUseID", default)]
pub tool_use_id: Option<String>,
#[serde(rename = "toolUseResult", default)]
pub tool_use_result: Option<Value>,
// ----- legacy KeiSeiKit flat -----
#[serde(default)]
pub ts: Option<i64>,
#[serde(default)]
pub tool: Option<String>,
#[serde(default)]
pub file_path: Option<String>,
#[serde(default)]
pub is_error: Option<bool>,
#[serde(default)]
pub event_class: Option<String>,
}
impl TraceLine {
/// Best-effort plain text from `message` field for guard + persist.
/// Returns None when message is absent or not a JSON String/Object.
/// For object-form messages, serializes back to JSON for persistence.
pub fn message_text(&self) -> Option<String> {
match self.message.as_ref()? {
Value::String(s) => Some(s.clone()),
v @ Value::Object(_) => Some(v.to_string()),
_ => None,
}
}
/// Resolve event timestamp, preferring legacy `ts` (epoch i64) over
/// real-trace `timestamp` (RFC-3339 string), falling back to "now".
pub fn resolved_ts(&self) -> i64 {
if let Some(t) = self.ts {
return t;
}
if let Some(s) = self.timestamp.as_deref() {
if let Some(epoch) = parse_timestamp_to_epoch(s) {
return epoch;
}
}
Utc::now().timestamp()
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn deserialize_real_trace_assistant_line() {
let json = r#"{"type":"assistant","timestamp":"2026-04-30T18:27:10Z",
"sessionId":"sx","cwd":"/x","gitBranch":"main","uuid":"u1",
"message":{"role":"assistant","content":[
{"type":"tool_use","id":"t1","name":"Read","input":{"file_path":"/a"}}
]}}"#;
let t: TraceLine = serde_json::from_str(json).unwrap();
assert_eq!(t.kind.as_deref(), Some("assistant"));
assert_eq!(t.cwd.as_deref(), Some("/x"));
assert!(t.message.is_some());
}
#[test]
fn deserialize_legacy_flat_line() {
let json = r#"{"ts":1700000000,"kind":"tool_use","tool":"Bash","message":"ok"}"#;
let t: TraceLine = serde_json::from_str(json).unwrap();
assert_eq!(t.ts, Some(1700000000));
assert_eq!(t.tool.as_deref(), Some("Bash"));
assert_eq!(t.message_text().as_deref(), Some("ok"));
}
#[test]
fn message_text_object_serialises_back() {
let t = TraceLine {
message: Some(serde_json::json!({"role":"user"})),
..Default::default()
};
let s = t.message_text().unwrap();
assert!(s.contains("\"role\""));
}
#[test]
fn resolved_ts_prefers_ts_over_timestamp() {
let t = TraceLine {
ts: Some(42),
timestamp: Some("2026-04-30T18:27:10Z".into()),
..Default::default()
};
assert_eq!(t.resolved_ts(), 42);
}
}