Wave A — Functional ingest fix (root cause of empty Sleep reports):
- Rewrote TraceLine struct to match real Claude Code trace JSONL:
type (was kind), timestamp ISO8601 (was epoch ts), message Object,
cwd / gitBranch / parentUuid / uuid / subtype / toolUseID / toolUseResult
- New src/extract.rs: extract_tool_uses + extract_tool_result walks
message.content[] for nested tool_use / tool_result blocks
- New src/classifier.rs: explicit table classifier (tool_error, user_correction,
retry_loop, permission_denied, tool_use:<name>, ...) replaces shallow heuristic
- New src/error.rs: KeiMemoryError enum (IO/Parse/Db) replaces semantic
mismatch where IO error was wrapped as rusqlite::InvalidParameterName
- New src/trace_line.rs: TraceLine + helpers (cube extraction)
- Schema migration v3: events.cwd column + 3 hot-query indices
(events.tool, events.file_path, events.ts) + UNIQUE on patterns
- New tests/ingest_real_trace.rs: synth-fixture asserts tool/file/cwd/class extraction
Wave B — Lib crate split:
- Cargo.toml: [lib] target added alongside existing [[bin]]
- src/lib.rs: pub re-export of all 18 modules
- src/main.rs: 11 mod declarations replaced by single use kei_memory::{…}
- tests/integration.rs: #[path] hack replaced by use kei_memory::{…}
Wave C — TF-IDF dedup + single-JOIN + filter_map fix:
- Schema migration v2: tokens.idf_dirty column + flag-based dedup
- index_document no longer triggers per-call recompute_idf rebuild
- top_similar uses single JOIN via vectors_for_overlapping_sessions helper
(was N round-trips, one session_vector per candidate)
- All filter_map(|r| r.ok()) row-error swallowing replaced with ? propagation
- New tests/tfidf_idf_dedup.rs: 4 tests covering dedup behaviour, IDF emptiness,
JOIN-pruning, empty-query safety
Wave D — Commands split + nits:
- New src/dump.rs (43 LOC) + src/stats.rs (33 LOC):
CLI renderers extracted from commands.rs (was inline SQL + format)
- src/commands.rs: thin wrappers, -42 LOC
- src/injection_guard.rs: inline tests removed (-26 LOC), file under 200 LOC threshold
- tests/injection_guard_unit.rs (new): 4 tests in proper integration crate
- src/patterns.rs: INSERT replaced with INSERT...ON CONFLICT...DO UPDATE
(idempotent re-ingest, uses Wave A's UNIQUE index)
- src/analyze.rs + src/coaccess.rs: filter_map row-error fixes
- src/coaccess.rs: misleading PK comment rewritten
Verify-before-commit (RULE 0.13 §"Verify-before-commit"):
- cargo check --all-targets: PASS (1 unrelated dead-code warning)
- cargo test: 42 passed, 0 failed across 9 test binaries
- STATUS-TRUTH markers aggregated at .claude/agents/_merge/kei-memory-2026-05-01/
Architect-spotted ARCH-MAJOR + ARCH-MINOR + ARCH-NIT findings addressed:
- ARCH-MAJOR Cargo.toml binary-only (Wave B)
- ARCH-MAJOR schema missing indices (Wave A v3)
- ARCH-MAJOR ingest_jsonl choke point (Wave A — extract.rs + classifier.rs)
- ARCH-MAJOR idf O(N·V) per-call rebuild (Wave C)
- ARCH-MINOR patterns no UPSERT (Wave D)
- ARCH-MINOR commands.rs houses dump+stats (Wave D)
- ARCH-MINOR classifier silent contract (Wave A)
- ARCH-MINOR IO error wrapped as rusqlite (Wave A)
- ARCH-MINOR injection_guard inline tests (Wave D)
- ARCH-MINOR tfidf top_similar N round-trips (Wave C)
- ARCH-NIT 3× filter_map(|r| r.ok()) sites (Wave C + D)
- ARCH-NIT coaccess misleading comment (Wave D)
=== STATUS-TRUTH MARKER ===
shipped: functional
stubs: 0
cargo-check: PASS
cargo-test: PASS (42 tests, 0 failures)
behaviour-verified: yes
follow-up-required:
- tests/ingest_guard_tests.rs + tests/guard_test_corpus.rs still on #[path] hack (Wave B follow-up note, ~5 LOC)
- dead_code warning Severity::Warn unused (pre-existing, not blocking)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
132 lines
4.4 KiB
Rust
132 lines
4.4 KiB
Rust
//! TraceLine — superset of real-trace + legacy-flat trace fields.
|
|
//!
|
|
//! Constructor Pattern: this cube only declares the deserialised line
|
|
//! plus tiny helpers (text extraction, ts resolution). Decoding is
|
|
//! `serde_json` driven; persistence + classification live elsewhere.
|
|
//!
|
|
//! Real Claude Code trace shape (sample 51a176c0-*.jsonl, 2026-04-30):
|
|
//! {"type": "assistant" | "user" | ..., "timestamp": "<rfc3339>",
|
|
//! "sessionId": "...", "cwd": "...", "gitBranch": "...",
|
|
//! "uuid": "...", "parentUuid": "...",
|
|
//! "message": {"role": "...", "content": [...]}}
|
|
//!
|
|
//! Legacy KeiSeiKit flat shape (still supported for back-compat tests):
|
|
//! {"ts": 1700000000, "kind": "tool_use", "tool": "Bash",
|
|
//! "file_path": "...", "is_error": false, "message": "..."}
|
|
|
|
use crate::extract::parse_timestamp_to_epoch;
|
|
use chrono::Utc;
|
|
use serde::Deserialize;
|
|
use serde_json::Value;
|
|
|
|
#[derive(Debug, Deserialize, Default)]
|
|
pub struct TraceLine {
|
|
// ----- real Claude Code trace -----
|
|
#[serde(rename = "type", default)]
|
|
pub kind: Option<String>,
|
|
#[serde(default)]
|
|
pub timestamp: Option<String>,
|
|
#[serde(rename = "sessionId", default)]
|
|
pub session_id: Option<String>,
|
|
#[serde(default)]
|
|
pub cwd: Option<String>,
|
|
#[serde(rename = "gitBranch", default)]
|
|
pub git_branch: Option<String>,
|
|
#[serde(rename = "parentUuid", default)]
|
|
pub parent_uuid: Option<String>,
|
|
#[serde(default)]
|
|
pub uuid: Option<String>,
|
|
#[serde(default)]
|
|
pub subtype: Option<String>,
|
|
#[serde(default)]
|
|
pub message: Option<Value>,
|
|
#[serde(rename = "toolUseID", default)]
|
|
pub tool_use_id: Option<String>,
|
|
#[serde(rename = "toolUseResult", default)]
|
|
pub tool_use_result: Option<Value>,
|
|
// ----- legacy KeiSeiKit flat -----
|
|
#[serde(default)]
|
|
pub ts: Option<i64>,
|
|
#[serde(default)]
|
|
pub tool: Option<String>,
|
|
#[serde(default)]
|
|
pub file_path: Option<String>,
|
|
#[serde(default)]
|
|
pub is_error: Option<bool>,
|
|
#[serde(default)]
|
|
pub event_class: Option<String>,
|
|
}
|
|
|
|
impl TraceLine {
|
|
/// Best-effort plain text from `message` field for guard + persist.
|
|
/// Returns None when message is absent or not a JSON String/Object.
|
|
/// For object-form messages, serializes back to JSON for persistence.
|
|
pub fn message_text(&self) -> Option<String> {
|
|
match self.message.as_ref()? {
|
|
Value::String(s) => Some(s.clone()),
|
|
v @ Value::Object(_) => Some(v.to_string()),
|
|
_ => None,
|
|
}
|
|
}
|
|
|
|
/// Resolve event timestamp, preferring legacy `ts` (epoch i64) over
|
|
/// real-trace `timestamp` (RFC-3339 string), falling back to "now".
|
|
pub fn resolved_ts(&self) -> i64 {
|
|
if let Some(t) = self.ts {
|
|
return t;
|
|
}
|
|
if let Some(s) = self.timestamp.as_deref() {
|
|
if let Some(epoch) = parse_timestamp_to_epoch(s) {
|
|
return epoch;
|
|
}
|
|
}
|
|
Utc::now().timestamp()
|
|
}
|
|
}
|
|
|
|
#[cfg(test)]
|
|
mod tests {
|
|
use super::*;
|
|
|
|
#[test]
|
|
fn deserialize_real_trace_assistant_line() {
|
|
let json = r#"{"type":"assistant","timestamp":"2026-04-30T18:27:10Z",
|
|
"sessionId":"sx","cwd":"/x","gitBranch":"main","uuid":"u1",
|
|
"message":{"role":"assistant","content":[
|
|
{"type":"tool_use","id":"t1","name":"Read","input":{"file_path":"/a"}}
|
|
]}}"#;
|
|
let t: TraceLine = serde_json::from_str(json).unwrap();
|
|
assert_eq!(t.kind.as_deref(), Some("assistant"));
|
|
assert_eq!(t.cwd.as_deref(), Some("/x"));
|
|
assert!(t.message.is_some());
|
|
}
|
|
|
|
#[test]
|
|
fn deserialize_legacy_flat_line() {
|
|
let json = r#"{"ts":1700000000,"kind":"tool_use","tool":"Bash","message":"ok"}"#;
|
|
let t: TraceLine = serde_json::from_str(json).unwrap();
|
|
assert_eq!(t.ts, Some(1700000000));
|
|
assert_eq!(t.tool.as_deref(), Some("Bash"));
|
|
assert_eq!(t.message_text().as_deref(), Some("ok"));
|
|
}
|
|
|
|
#[test]
|
|
fn message_text_object_serialises_back() {
|
|
let t = TraceLine {
|
|
message: Some(serde_json::json!({"role":"user"})),
|
|
..Default::default()
|
|
};
|
|
let s = t.message_text().unwrap();
|
|
assert!(s.contains("\"role\""));
|
|
}
|
|
|
|
#[test]
|
|
fn resolved_ts_prefers_ts_over_timestamp() {
|
|
let t = TraceLine {
|
|
ts: Some(42),
|
|
timestamp: Some("2026-04-30T18:27:10Z".into()),
|
|
..Default::default()
|
|
};
|
|
assert_eq!(t.resolved_ts(), 42);
|
|
}
|
|
}
|