KeiSeiKit-1.0/_primitives/_rust/kei-conflict-scan/src/scanners/orphans.rs
Parfii-bot f354aaccfc fix(kei-conflict-scan): close 3 backlog bugs + Phase C draft emission
Closes engine bugs #1, #2, #3 from the user's backlog.md entry dated
2026-05-11 "kei-refactor-engine — 4 false-positive bugs". Bug #4 was
fixed in 6cd99982 (wikilink path-norm + handoff scanner removal).

## Bug #1 — vendored marketplaces skip

Engine was scanning `plugins/marketplaces/claude-plugins-official/` —
vendored upstream code where Constructor Pattern thresholds don't
apply. ~246 cp-violations were from this tree.

Fix: `tree::should_skip_path()` central filter. Skips any path
component named `marketplaces`, `target`, `node_modules`, or `.git`.
Applied via `WalkDir::filter_entry()` in `collect_markdown`,
`collect_with_ext`, `scanners::cp::scan`, `scanners::orphans::scan`,
`scanners::orphans::all_basenames`. `scanners::cp::skip_dir` now
delegates to `should_skip_path` (removed the older inline
`/target/`-substring check).

## Bug #2 — hooks-share-matcher false-positive class

Claude Code hook chains are designed to support N hooks per event by
design. `scanners::hooks` was flagging every pair sharing a matcher
as a "redundancy conflict" — 9 hooks/medium findings in the last
deep-sleep run, every one false-positive.

Fix: `scanners::hooks::scan` reduced to a no-op stub returning
`Vec::new()`. Module docstring documents the retraction + future
direction (a real `hooks-validity` scanner for broken shebangs,
missing chmod, syntax errors would replace it).

## Bug #3 — `.patch` file not unified diff

Already resolved in prior commit (v0.14.1 retraction in patch.rs):
CLI default is `plan-autoresolve.md`, Phase C template references
`-autoresolve.md` suffix, `write_patch` is deprecated shim. Only
legacy `.patch` artefacts in sync-repo/reports/ remain — those are
audit trail, not active.

## Phase C draft file emission (deep-sleep-trigger-prompt.md §6.d)

The earlier Phase C template emitted `proposed_rule` markdown blocks
only — no actionable artefacts. Extended §6 with step 6.d: when
WITH_FORK=1 AND fork branch was created, ALSO write skeleton draft
files into the branch:

  sync-repo/sleep-deep/YYYY-MM-DD/drafts/rules/<slug>.md
  sync-repo/sleep-deep/YYYY-MM-DD/drafts/hooks/<slug>.sh

Drafts follow pattern-codifier-agent Phase 3 templates. Phase C does
NOT register hooks — that's pattern-codifier's job via /sleep-review
morning click-flow (skill Phase 3a added in ~/.claude commit 49a320d).
This closes the loop: Phase C surfaces draft → morning review clicks
approve → pattern-codifier installs → settings.json registered.

Smoke-test required in §6.d: every emitted `.sh` MUST `bash -n` clean
or be excluded from commit + listed in plan markdown.

## Results on ~/.claude/memory/sync-repo (live data)

| Scanner   | Before | After | Delta |
|-----------|-------:|------:|------:|
| orphans   |    108 |     1 |  -107 |
| hooks     |      2 |     0 |    -2 |
| cp        |    174 |     0 |  -174 |
| **TOTAL** |    284 |     1 |  -283 |

On full ~/.claude scan: total drops from ~1614 (per 2026-05-11
backlog) to 983 (cp=186 + orphans=797 — orphan count high because
~/.claude tree has many memory/chatlogs/ refs out-of-tree).

## Tests

12/12 pass on kei-conflict-scan workspace (4 unit + 8 integration).
Pre-existing `oversize_file_flagged` + `orphan_wikilinks_flagged`
still green; new `cross_repo_wikilink_not_flagged` +
`path_prefixed_wikilink_matches_basename` from 6cd99982 still green.

Private mirror at ~/Projects/KeiSeiKit/_primitives/_rust/ synced
(4 files: tree.rs, scanners/cp.rs, scanners/orphans.rs,
scanners/hooks.rs).

Closes backlog "engine-noise-2026-05-11" tag bugs #1, #2, #3.
2026-05-12 18:30:01 +08:00

138 lines
4.2 KiB
Rust

//! Orphan-reference detector.
//!
//! Finds `[[wikilink]]` references whose targets do not exist anywhere
//! under the root. Case-insensitive basename match.
//!
//! The earlier `handoffs: - **name**` heuristic was removed (2026-05-12)
//! after a sync-repo scan showed it matched 0 real handoff sections and
//! every match was a prose bold-bullet (e.g. `- **english-jargon** —`).
//! Real handoff syntax in agent-graph repos uses YAML frontmatter, not
//! prose markdown.
use crate::conflict::{Category, Conflict, Severity};
use crate::tree::{read_lossy, rel, should_skip_path};
use regex::Regex;
use std::collections::HashSet;
use std::path::Path;
use walkdir::WalkDir;
fn all_basenames(root: &Path) -> HashSet<String> {
let mut out = HashSet::new();
for e in WalkDir::new(root)
.into_iter()
.filter_entry(|e| !should_skip_path(e.path()))
.filter_map(|e| e.ok())
{
if e.file_type().is_file() {
if let Some(stem) = e.path().file_stem().and_then(|s| s.to_str()) {
out.insert(stem.to_lowercase());
}
}
}
out
}
fn extract_wikilinks(content: &str) -> Vec<String> {
let rx = Regex::new(r"\[\[([^\]\|#]+?)(?:#[^\]]*)?(?:\|[^\]]*)?\]\]").expect("static regex");
rx.captures_iter(content)
.map(|c| c[1].trim().to_lowercase())
.collect()
}
/// Normalize a wikilink target to a basename comparable against
/// `all_basenames` (file_stem-based index).
///
/// Returns `None` when the target escapes the scan root via `../` —
/// such refs point outside the scan tree (e.g. `~/.claude/rules/*` from
/// inside a sync-repo MEMORY.md) and cannot be validated by this scanner.
///
/// For path-prefixed targets (`chatlogs/X/Y`, `concepts/Z`) only the
/// last segment is compared, matching how `all_basenames` builds its
/// index. The `.md` suffix is stripped — `file_stem` does the same.
fn normalize_target(raw: &str) -> Option<String> {
if raw.starts_with("../") || raw.contains("/../") {
return None;
}
let bn = raw.rsplit('/').next().unwrap_or(raw);
let bn = bn.strip_suffix(".md").unwrap_or(bn);
Some(bn.to_string())
}
pub fn scan(root: &Path) -> Vec<Conflict> {
let index = all_basenames(root);
let mut out = Vec::new();
for e in WalkDir::new(root)
.into_iter()
.filter_entry(|e| !should_skip_path(e.path()))
.filter_map(|e| e.ok())
{
if !e.file_type().is_file() {
continue;
}
if e.path().extension().is_none_or(|x| x != "md") {
continue;
}
let content = read_lossy(e.path());
let file_rel = rel(root, e.path());
for raw in extract_wikilinks(&content) {
let Some(normalized) = normalize_target(&raw) else {
continue;
};
if !index.contains(&normalized) {
out.push(orphan(&file_rel, &raw, "wikilink"));
}
}
}
out
}
fn orphan(file: &str, target: &str, kind: &str) -> Conflict {
Conflict::new(
Category::Orphans,
Severity::Low,
vec![file.to_string()],
format!("{} target '{}' not found under root", kind, target),
"either create the target file or remove the stale reference".to_string(),
true,
)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn cross_repo_ref_skipped() {
assert_eq!(normalize_target("../../../rules/recurrence-escalate"), None);
assert_eq!(normalize_target("../foo"), None);
assert_eq!(normalize_target("docs/../escape"), None);
}
#[test]
fn path_prefixed_target_basenamed() {
assert_eq!(
normalize_target("chatlogs/ml-keilab/2026-05-08-something"),
Some("2026-05-08-something".to_string())
);
assert_eq!(
normalize_target("concepts/keibeta"),
Some("keibeta".to_string())
);
}
#[test]
fn plain_basename_passes_through() {
assert_eq!(
normalize_target("ai-creative-engine"),
Some("ai-creative-engine".to_string())
);
}
#[test]
fn md_suffix_stripped() {
assert_eq!(
normalize_target("docs/intro.md"),
Some("intro".to_string())
);
}
}