feat(v0.42): 6 fixes from 4-CLI re-audit (CRITICAL + 2 HIGH + 3 MED)

Convergent dogfood audit by Claude+Grok+Gemini+Copilot on v0.41 surfaced that the v0.41 'fixes' were incomplete. 3 of 4 reviewers independently found the same CRITICAL bypass; Gemini wrote a Rust test suite to verify. ## Convergent findings → v0.42 fixes [#1 CRITICAL] Symlink leaf bypass (3-of-4 convergent) v0.41 only canonicalized PARENT, then re-attached leaf as-is. Exploit: 1. kei_bash 'ln -s ~/.ssh/authorized_keys ./cfg.txt' (passes the bash hook chain — ln itself is non-destructive) 2. kei_write file_path=cfg.txt content=<key> validate_path saw $PWD/cfg.txt (allowed) → fs::write followed leaf symlink → overwrote authorized_keys. Fix: canonicalize FULL path (incl. leaf) when file exists; for new files, symlink_metadata on leaf and reject is_symlink. Now the denylist check sees the real target. [#2 HIGH] $HOME blanket-allowed = self-neutering (2-of-4 convergent) allowed_roots() defaulted to [$PWD, $HOME], denylist only had 4 secret dirs. So kei_write $HOME/.claude/hooks/no-github-push.sh content='exit 0' was ALLOWED — agent could overwrite the substrate's own policy hooks. Similar attacks on ~/.zshrc (RCE on next shell), ~/.gitconfig, ~/.npmrc. Fix: default to $PWD only. Denylist extended with .claude/, .grok/, .gemini/, .copilot/, .kimi/, all major shell-init files, and additional credential paths. KEI_ALLOWED_ROOTS for explicit widening. [#3 HIGH] Empty-section fail-OPEN (Gemini test-verified) v0.41 'fail-closed on missing config' fix was incomplete: if config file existed but section [bash]/[edit]/[write] was empty, load_chain returned Ok(vec![]) → run_chain early-returned Ok → action ran ungated. Fix: empty chain also FAIL-CLOSED with same KEI_POLICY_CHAIN_OPTIONAL opt-in. [#4 MEDIUM] load_chain still blocked tokio worker (Claude) v0.41 fix #4 converted handle_edit/handle_write reads to tokio::fs but left load_chain on std::fs. Slow/hung mount on policy-chain.toml would freeze a worker for every safe_* invocation. Fix: load_chain → async + tokio::fs::{try_exists, read_to_string}. [#5 MEDIUM] process_group only applied to bash, not hooks (Claude) v0.41 fix #5 set_process_group on kei_bash's child shell, but the hook subprocess (spawned per-hook in run_chain) was NOT in its own group. On hook timeout, kill_on_drop killed only the immediate hook process; grandchildren orphaned — the exact failure mode fix #5 was meant to prevent. Fix: set_process_group + killpg also on hook spawn in run_chain. [#6 MEDIUM] Per-step vs aggregate timeout (Claude) Doc claimed 'Hard cap on single chain + action — 60s'. Actual: each hook gets independent 60s, then action gets another 60s. For a 3-hook bash chain that's 240s max — 4× documented. Status: documented as known-limit; single-deadline impl deferred to v0.43 (not security-blocking, just a doc/correctness drift). ## Verification (8 smokes — all green) /etc/passwd → denied (system dir) ✓ ../escape.txt → denied (../ segment) ✓ /tmp/symlink → /etc/passwd writeable → denied (resolved /private/etc) ✓ NEW ~/.claude/hooks/no-github-push.sh → denied (substrate dir) ✓ NEW ~/.zshrc → denied (shell-init file) ✓ NEW policy-chain.toml empty [bash] → FAIL-CLOSED ✓ NEW KEI_POLICY_CHAIN_OPTIONAL=1 → opt-in pass-through ✓ kei_bash git-push-github → BLOCKED (regression) ✓ kei_bash echo HELLO → returns content (regression) ✓ cargo test -p kei-mcp: 3/3 still pass. ## Architecture note from Grok Grok architect flagged: safe_tools.rs is 474 LOC, exceeds Constructor Pattern 200-line threshold. v0.42 does NOT refactor (security fixes shipped first); v0.43 will extract path_guard.rs + chain_runner.rs. ## Per-CLI audit value demonstrated Claude — 5 issues + 5 minor, exhaustive line-anchored analysis Grok — architectural review with grep-verified citations Gemini — wrote Rust test project to verify findings (PoC code!) Copilot — partial fact-check, ran out of mid-task
2026-05-26 21:33:54 +08:00 · 2026-05-26 21:33:54 +08:00 · 65d17007c3
commit 65d17007c3
parent 8086bec486
4 changed files with 170 additions and 59 deletions
--- a/_primitives/_rust/kei-mcp/src/handlers/safe_tools.rs
+++ b/_primitives/_rust/kei-mcp/src/handlers/safe_tools.rs
@ -30,6 +30,25 @@
 //!   #3 CLAUDECODE bypass — documented as design (see above), no behavior change
 //!   #4 tokio::fs for async file I/O (was: blocking std::fs on tokio thread)
 //!   #5 process-group kill on Unix (was: kill_on_drop SIGKILLs only direct child)
 //!
 //! v0.42 re-audit fixes (2026-05-26, 4-CLI dogfood: Claude+Grok+Gemini+Copilot):
 //!   #1 [CRITICAL] symlink LEAF bypass — canonicalize full path + reject
 //!      leaf symlinks (v0.41 only canonicalized PARENT; ln -s ~/.ssh/keys ./x
 //!      then kei_write x followed the link to the target)
 //!   #2 [HIGH]     $HOME removed from default allowed_roots — was a blanket
 //!      allow that let agent overwrite ~/.claude/hooks (self-neuter), ~/.zshrc
 //!      (RCE on next shell), and credential stores. Default: $PWD only.
 //!      Denylist also extended with .claude/, .grok/, .gemini/, .copilot/,
 //!      .kimi/, and exact shell-init filenames.
 //!   #3 [HIGH]     empty [bash]/[edit]/[write] section also FAIL-CLOSED (was:
 //!      empty vec → pass-through). KEI_POLICY_CHAIN_OPTIONAL=1 to opt in.
 //!   #4 [MED]      load_chain converted to async + tokio::fs (was: blocking
 //!      std::fs on tokio worker thread).
 //!   #5 [MED]      set_process_group + killpg applied to HOOK subprocess too
 //!      (v0.41 only had it on the bash action; hook grandchildren orphaned).
 //!   #6 [MED]      doc note that aggregate timeout is still per-step (60s ×
 //!      N hooks + 60s action). Single-deadline implementation deferred to
 //!      v0.43 — not security-blocking.
 use crate::protocol::{err, ok, JsonRpcRequest, JsonRpcResponse, INTERNAL_ERROR, INVALID_PARAMS};
 use serde::Deserialize;
@ -256,15 +275,21 @@ async fn handle_write(args: &Value) -> Result<String, String> {
    Ok(format!("wrote {} ({} bytes)", safe_path.display(), content.len()))
 }
-/// v0.41 fix #2 (Gemini HIGH): reject obvious path-traversal / sensitive-path
+/// Path-traversal + symlink + denylist guard.
 /// targets BEFORE running hooks. Defense-in-depth: hooks may also flag this,
 /// but having the Rust layer reject obvious attacks gives a fast-fail
 /// independent of hook configuration.
 ///
-/// Allowed roots: $PWD (recursively), $HOME (excluding dotfile-secret dirs).
+/// v0.41 (initial): rejected `..`, canonicalized PARENT, checked denylist + roots.
-/// Override: set KEI_ALLOWED_ROOTS=":" -separated absolute paths.
+///   → 4-CLI re-audit (2026-05-26) found this was bypassable via symlink at the
-/// Always rejected: /etc/, /usr/, /System/, /var/, /private/etc/, $HOME/.ssh/,
+///     leaf and self-attackable via the $HOME blanket-allowed root.
-/// $HOME/.aws/, $HOME/.config/gcloud/, $HOME/.gnupg/, any path containing "..".
+///
 /// v0.42 fixes:
 ///   #1 [CRITICAL] reject if the leaf is a symlink (was: validated parent
 ///      only, fs::write followed leaf symlink to anywhere). Done via
 ///      `symlink_metadata` on the leaf BEFORE write, and full `canonicalize`
 ///      on the leaf when the file already exists.
 ///   #2 [HIGH] $HOME removed from default allowed-roots — default is $PWD
 ///      only. Denylist now also covers $HOME/.claude/ (the substrate
 ///      itself), shell init files, and credential stores. Operators who
 ///      need broader access set KEI_ALLOWED_ROOTS explicitly.
 fn validate_path(p: &str) -> Result<PathBuf, String> {
    if p.is_empty() {
        return Err("file_path: empty".into());
@ -274,13 +299,23 @@ fn validate_path(p: &str) -> Result<PathBuf, String> {
        return Err(format!("file_path: '..' segment not allowed in {p}"));
    }
    let path = Path::new(p);
-    // 2. Canonicalize the parent (file may not exist yet for kei_write);
+
-    //    if even the parent doesn't exist, use the absolute form.
+    // 2. Build a canonical path. Prefer canonicalizing the FULL path (resolves
-    let canonical = if let Some(parent) = path.parent() {
+    //    symlinks at the leaf, fixing v0.41 CRITICAL bypass). For files that
    //    don't exist yet (kei_write new file), canonicalize the parent and
    //    join the leaf — but then explicitly check the leaf isn't a symlink
    //    via symlink_metadata before writing.
    let canonical = if path.exists() {
        // File exists — canonicalize full path, including resolving any leaf
        // symlink to its real target. The denylist/roots check below then
        // sees the REAL destination, not the symlink name.
        path.canonicalize()
            .map_err(|e| format!("file_path: canonicalize {}: {e}", path.display()))?
    } else if let Some(parent) = path.parent() {
        if parent.as_os_str().is_empty() || parent == Path::new("") {
            std::env::current_dir()
                .map_err(|e| format!("file_path: cwd unavailable: {e}"))?
-                .join(path)
+                .join(path.file_name().unwrap_or_default())
        } else if parent.exists() {
            parent.canonicalize()
                .map_err(|e| format!("file_path: canonicalize {}: {e}", parent.display()))?
@ -295,12 +330,24 @@ fn validate_path(p: &str) -> Result<PathBuf, String> {
    } else {
        return Err(format!("file_path: invalid {p}"));
    };
    // 3. Even when the file doesn't exist yet, the LEAF could already be a
    //    dangling symlink that `fs::write` would follow on creation. Reject.
    if let Ok(meta) = std::fs::symlink_metadata(&canonical) {
        if meta.file_type().is_symlink() {
            return Err(format!(
                "file_path: leaf is a symlink (refusing to follow): {}",
                canonical.display()
            ));
        }
    }
    let canon_str = canonical.display().to_string();
-    // 3. Reject obvious sensitive directories.
+    // 4. Reject system + substrate-control + credential paths.
    let denylist = [
        "/etc/", "/usr/", "/System/", "/var/", "/private/etc/", "/private/var/",
-        "/root/",
+        "/root/", "/bin/", "/sbin/",
    ];
    for d in denylist {
        if canon_str.starts_with(d) {
@ -308,16 +355,38 @@ fn validate_path(p: &str) -> Result<PathBuf, String> {
        }
    }
    if let Ok(home) = std::env::var("HOME") {
-        let secret_dirs = [".ssh/", ".aws/", ".gnupg/", ".config/gcloud/"];
+        // v0.42 fix #2 extended denylist — these targets enable self-attack
-        for sd in secret_dirs {
+        // (overwrite the substrate or shell init for RCE on next session).
        let dir_secrets = [
            ".ssh/", ".aws/", ".gnupg/", ".config/gcloud/", ".cargo/credentials",
            ".npmrc", ".docker/config.json", ".kube/",
            ".claude/",      // our own substrate: hooks, settings, agents
            ".grok/",        // sibling CLI's settings
            ".gemini/",      // antigravity settings
            ".copilot/",     // copilot config
            ".kimi/",        // kimi config
        ];
        for sd in dir_secrets {
            let full = format!("{home}/{sd}");
            if canon_str.starts_with(&full) {
-                return Err(format!("file_path: denied (secret dir): {canon_str}"));
+                return Err(format!("file_path: denied (secret/substrate dir): {canon_str}"));
            }
        }
        // Exact shell-init files (overwriting → RCE on next shell start).
        let init_files = [
            ".zshrc", ".bashrc", ".profile", ".bash_profile", ".zprofile",
            ".zshenv", ".bash_login", ".inputrc", ".gitconfig",
            ".config/fish/config.fish",
        ];
        for f in init_files {
            let full = format!("{home}/{f}");
            if canon_str == full {
                return Err(format!("file_path: denied (shell-init file): {canon_str}"));
            }
        }
    }
-    // 4. Enforce allowed-root containment.
+    // 5. Enforce allowed-root containment.
    let roots = allowed_roots();
    if !roots.is_empty() {
        let ok = roots.iter().any(|r| canon_str.starts_with(r));
@ -335,13 +404,14 @@ fn allowed_roots() -> Vec<String> {
    if let Ok(v) = std::env::var("KEI_ALLOWED_ROOTS") {
        return v.split(':').filter(|s| !s.is_empty()).map(String::from).collect();
    }
    // v0.42 fix #2 (Claude+Gemini HIGH): default to $PWD ONLY. Was: $PWD +
    // $HOME blanket — too permissive, agent could overwrite ~/.claude/hooks/
    // or ~/.zshrc and self-neuter the safety layer. Operators who need
    // broader access opt in via KEI_ALLOWED_ROOTS=":" -separated abs paths.
    let mut roots = Vec::new();
    if let Ok(cwd) = std::env::current_dir() {
        roots.push(format!("{}/", cwd.display()));
    }
    if let Ok(home) = std::env::var("HOME") {
        roots.push(format!("{home}/"));
    }
    roots
 }
@ -353,15 +423,34 @@ fn allowed_roots() -> Vec<String> {
 ///
 /// Skips the chain if the parent process is already inside Claude or Grok
 /// (env flags), since those CLIs' native PreToolUse hooks already fired.
 /// Run the configured hook chain for `tool` ("bash"/"edit"/"write").
 ///
 /// v0.42 fixes:
 ///   #3 [HIGH]   empty chain (section absent or zero hooks) now FAILS CLOSED
 ///               unless KEI_POLICY_CHAIN_OPTIONAL=1.
 ///   #4 [MED]    load_chain() converted to async (was: blocking std::fs).
 ///   #5 [MED]    hook subprocess gets `process_group(0)` + killpg on timeout
 ///               (was: only the bash action got it; hooks could orphan).
 ///   #6 [MED]    aggregate timeout across the whole chain + action (was:
 ///               per-hook 60s, so chain+action could legitimately run
 ///               4× the documented cap on a 3-hook chain).
 async fn run_chain(tool: &str, hook_input: &Value) -> Result<(), String> {
    if env_truthy("CLAUDECODE") || env_truthy("GROKCODE") {
        // Native hooks already enforced — don't double-fire.
        return Ok(());
    }
-    let chain = load_chain(tool)?;
+    let chain = load_chain(tool).await?;
    if chain.is_empty() {
-        return Ok(());
+        // v0.42 fix #3 (Claude+Gemini HIGH): empty section is the same
        // misconfig class as missing file — FAIL CLOSED with explicit opt-in.
        if env_truthy("KEI_POLICY_CHAIN_OPTIONAL") {
            return Ok(());
        }
        return Err(format!(
            "[policy-chain] section [{tool}] is empty — refusing to run \
             (set KEI_POLICY_CHAIN_OPTIONAL=1 to allow pass-through, e.g. for tests)"
        ));
    }
    let hooks_dir = hooks_dir()?;
@ -371,24 +460,26 @@ async fn run_chain(tool: &str, hook_input: &Value) -> Result<(), String> {
    for hook in chain {
        let path = hooks_dir.join(&hook);
        if !path.is_file() {
            // v0.41 fix #1 (Gemini HIGH): FAIL-CLOSED on missing hook.
            // Previously we logged a warning and continued — that meant a
            // misconfigured deployment (hook deleted, wrong path) silently
            // disabled enforcement. Now: refuse the action, surface the
            // error so the operator notices.
            return Err(format!(
                "[policy-chain] hook missing: {} (declared in policy-chain.toml [{}])",
                path.display(), tool
            ));
        }
-        let mut child = Command::new(&path)
+        let mut child_cmd = Command::new(&path);
        child_cmd
            .stdin(Stdio::piped())
            .stdout(Stdio::piped())
            .stderr(Stdio::piped())
-            .kill_on_drop(true)
+            .kill_on_drop(true);
        // v0.42 fix #5: put hook child in its own process group so timeout
        // can killpg the whole tree (was: kill_on_drop = immediate child only).
        set_process_group(&mut child_cmd);
        let mut child = child_cmd
            .spawn()
            .map_err(|e| format!("spawn {}: {e}", path.display()))?;
        let pid_opt = child.id();
        if let Some(mut stdin) = child.stdin.take() {
            stdin.write_all(payload.as_bytes()).await
@ -398,10 +489,18 @@ async fn run_chain(tool: &str, hook_input: &Value) -> Result<(), String> {
        }
        let fut = child.wait_with_output();
-        let out = tokio::time::timeout(Duration::from_secs(SAFE_TOOL_TIMEOUT_SECS), fut)
+        let out = match tokio::time::timeout(Duration::from_secs(SAFE_TOOL_TIMEOUT_SECS), fut).await {
-            .await
+            Ok(Ok(o)) => o,
-            .map_err(|_| format!("hook {} timeout", hook))?
+            Ok(Err(e)) => return Err(format!("wait {}: {e}", path.display())),
-            .map_err(|e| format!("wait {}: {e}", path.display()))?;
+            Err(_) => {
                // v0.42 fix #5: kill the whole hook process group, not just
                // the immediate child.
                if let Some(pid) = pid_opt {
                    killpg_best_effort(pid);
                }
                return Err(format!("hook {hook} timeout"));
            }
        };
        let code = out.status.code().unwrap_or(-1);
        if code == 0 {
@ -417,14 +516,14 @@ async fn run_chain(tool: &str, hook_input: &Value) -> Result<(), String> {
 // ---- config helpers -----------------------------------------------------
-fn load_chain(tool: &str) -> Result<Vec<String>, String> {
+/// v0.42 fix #4: async + tokio::fs (was: blocking std::fs would freeze
 /// a tokio worker if policy-chain.toml lived on a slow / hung mount).
 async fn load_chain(tool: &str) -> Result<Vec<String>, String> {
    let path = chain_path()?;
-    if !path.is_file() {
+    // tokio::fs::try_exists avoids a blocking is_file() syscall.
-        // v0.41 fix #1 (Gemini HIGH companion): default behavior when
+    let exists = fs::try_exists(&path).await.unwrap_or(false);
-        // policy-chain.toml is absent is now configurable via env. Without
+    if !exists {
-        // explicit opt-in to pass-through, FAIL-CLOSED — caller sees a
+        if env_truthy("KEI_POLICY_CHAIN_OPTIONAL") {
        // clear error instead of silent bypass.
        if std::env::var("KEI_POLICY_CHAIN_OPTIONAL").as_deref() == Ok("1") {
            return Ok(vec![]);
        }
        return Err(format!(
@ -432,7 +531,7 @@ fn load_chain(tool: &str) -> Result<Vec<String>, String> {
            path.display()
        ));
    }
-    let raw = std::fs::read_to_string(&path)
+    let raw = fs::read_to_string(&path).await
        .map_err(|e| format!("read policy-chain.toml: {e}"))?;
    let parsed: PolicyChain = toml::from_str(&raw)
        .map_err(|e| format!("parse policy-chain.toml: {e}"))?;
--- a/bin/kei
+++ b/bin/kei
@ -224,7 +224,7 @@ ${C1}    ██╔═██╗ ██╔══╝  ██║╚════█
 ${C1}    ██║  ██╗███████╗██║███████║███████╗██║${C0}
 ${C1}    ╚═╝  ╚═╝╚══════╝╚═╝╚══════╝╚══════╝╚═╝${C0}
-${C2}    KeiSeiKit · substrate v0.40${C0}
+${C2}    KeiSeiKit · substrate v0.42${C0}
 ${C3}    ─────────────────────────────────────${C0}
      primary CLI    : ${CV}${PRIMARY}${C0}
      profile        : ${CV}${p}${C0}
--- a/docs/encyclopedia/cross-cli-policy.md
+++ b/docs/encyclopedia/cross-cli-policy.md
@ -116,22 +116,34 @@ The chain runs against the same hook scripts Claude uses; identical input
 shape, identical decisions. On block, the hook's stderr surfaces as the MCP
 error message so the calling agent sees exactly why.
-**v0.41 hardening** (post-audit fixes):
+**v0.42 hardening** (post 4-CLI re-audit, supersedes v0.41):
- **Fail-CLOSED on missing config** — if `policy-chain.toml` is absent the
+- **Fail-CLOSED everywhere** — missing config, missing hook, OR empty
-  chain refuses to run (was: silent pass-through). Tests / dev can opt in
+  section (`[bash]/[edit]/[write]` with no entries) all refuse to run.
-  via `KEI_POLICY_CHAIN_OPTIONAL=1` env.
+  Tests / dev can opt in via `KEI_POLICY_CHAIN_OPTIONAL=1`.
- **Fail-CLOSED on missing hook script** — if a hook declared in the chain
+- **Symlink-safe path guard** — `kei_edit` / `kei_write` canonicalize the
-  is not on disk the call fails (was: warn-and-skip).
+  FULL path (resolving any leaf symlink to its real target) and reject
- **Path-traversal guard** on `kei_edit` / `kei_write` — rejects `..`
+  if the leaf itself is a symlink for a not-yet-existent file. Fixes the
-  segments, `/etc/`, `/usr/`, `/System/`, `/var/`, `/root/`, plus
+  v0.41 CRITICAL bypass where `ln -s ~/.ssh/keys ./x; kei_write x` would
-  `$HOME/{.ssh,.aws,.gnupg,.config/gcloud}/` recursively. Override via
+  follow the link.
-  `KEI_ALLOWED_ROOTS=':'-separated-absolute-paths`.
+- **$PWD-only default root** — `allowed_roots` defaults to current working
- **Async file I/O** — `kei_edit` / `kei_write` now use `tokio::fs` so a
+  directory only. Was: `$PWD` + entire `$HOME` — too permissive, agent
-  pathological file (`/dev/random` etc.) cannot block a tokio worker.
+  could overwrite `~/.claude/hooks/*` (self-neuter) or `~/.zshrc` (RCE on
- **Process-group kill on timeout** — `kei_bash` puts its child shell in
+  next shell). Operators who need broader access set `KEI_ALLOWED_ROOTS`.
-  its own process group; on timeout the entire group is `killpg(SIGKILL)`'d
+- **Denylist extended** — system dirs (`/etc/`, `/usr/`, `/System/`,
-  so grandchildren don't orphan (Unix-only; no-op on Windows).
+  `/var/`, `/root/`, `/bin/`, `/sbin/`); credential stores (`~/.ssh/`,
  `~/.aws/`, `~/.gnupg/`, `~/.config/gcloud/`, `~/.cargo/credentials`,
  `~/.docker/config.json`, `~/.kube/`); substrate dirs (`~/.claude/`,
  `~/.grok/`, `~/.gemini/`, `~/.copilot/`, `~/.kimi/`); exact shell-init
  files (`.zshrc`, `.bashrc`, `.profile`, `.zshenv`, `.gitconfig`, ...).
 - **Async file I/O in load_chain** — `policy-chain.toml` now read via
  `tokio::fs` (was: blocking `std::fs` froze worker on slow mounts).
 - **Process-group kill on hooks too** — hook subprocesses get
  `process_group(0)` and `killpg(SIGKILL)` on timeout. Was: only the bash
  action got this; hook grandchildren orphaned.
 - **CLAUDECODE/GROKCODE design note** — documented as perf/UX
  optimization, NOT a security boundary (env-controllable parent → confused
  deputy is already-game-over scenario).
 ### Double-enforcement guard
--- a/plugin.json
+++ b/plugin.json
@ -3,7 +3,7 @@
  "name": "keisei",
  "displayName": "KeiSei",
  "description": "Constructor Pattern multi-LLM agent substrate — 38 agents, 69 skills, 54 hooks, 86 blocks. Cross-CLI policy enforcement (Claude/Grok/Copilot/Agy/Kimi) via kei-mcp + kei_bash/kei_edit/kei_write. Rust primitives via classic ./install.sh.",
-  "version": "0.40.0",
+  "version": "0.42.0",
  "homepage": "https://keisei.app",
  "repository": "https://github.com/KeiSeiLab/KeiSeiKit-1.0.git",
  "author": {