From 65d17007c3260d5b29039809da849e7b2a1af65c Mon Sep 17 00:00:00 2001 From: KeiSei84 <2206745@gmail.com> Date: Tue, 26 May 2026 21:33:54 +0800 Subject: [PATCH] feat(v0.42): 6 fixes from 4-CLI re-audit (CRITICAL + 2 HIGH + 3 MED) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Convergent dogfood audit by Claude+Grok+Gemini+Copilot on v0.41 surfaced that the v0.41 'fixes' were incomplete. 3 of 4 reviewers independently found the same CRITICAL bypass; Gemini wrote a Rust test suite to verify. ## Convergent findings → v0.42 fixes [#1 CRITICAL] Symlink leaf bypass (3-of-4 convergent) v0.41 only canonicalized PARENT, then re-attached leaf as-is. Exploit: 1. kei_bash 'ln -s ~/.ssh/authorized_keys ./cfg.txt' (passes the bash hook chain — ln itself is non-destructive) 2. kei_write file_path=cfg.txt content= validate_path saw $PWD/cfg.txt (allowed) → fs::write followed leaf symlink → overwrote authorized_keys. Fix: canonicalize FULL path (incl. leaf) when file exists; for new files, symlink_metadata on leaf and reject is_symlink. Now the denylist check sees the real target. [#2 HIGH] $HOME blanket-allowed = self-neutering (2-of-4 convergent) allowed_roots() defaulted to [$PWD, $HOME], denylist only had 4 secret dirs. So kei_write $HOME/.claude/hooks/no-github-push.sh content='exit 0' was ALLOWED — agent could overwrite the substrate's own policy hooks. Similar attacks on ~/.zshrc (RCE on next shell), ~/.gitconfig, ~/.npmrc. Fix: default to $PWD only. Denylist extended with .claude/, .grok/, .gemini/, .copilot/, .kimi/, all major shell-init files, and additional credential paths. KEI_ALLOWED_ROOTS for explicit widening. [#3 HIGH] Empty-section fail-OPEN (Gemini test-verified) v0.41 'fail-closed on missing config' fix was incomplete: if config file existed but section [bash]/[edit]/[write] was empty, load_chain returned Ok(vec![]) → run_chain early-returned Ok → action ran ungated. Fix: empty chain also FAIL-CLOSED with same KEI_POLICY_CHAIN_OPTIONAL opt-in. [#4 MEDIUM] load_chain still blocked tokio worker (Claude) v0.41 fix #4 converted handle_edit/handle_write reads to tokio::fs but left load_chain on std::fs. Slow/hung mount on policy-chain.toml would freeze a worker for every safe_* invocation. Fix: load_chain → async + tokio::fs::{try_exists, read_to_string}. [#5 MEDIUM] process_group only applied to bash, not hooks (Claude) v0.41 fix #5 set_process_group on kei_bash's child shell, but the hook subprocess (spawned per-hook in run_chain) was NOT in its own group. On hook timeout, kill_on_drop killed only the immediate hook process; grandchildren orphaned — the exact failure mode fix #5 was meant to prevent. Fix: set_process_group + killpg also on hook spawn in run_chain. [#6 MEDIUM] Per-step vs aggregate timeout (Claude) Doc claimed 'Hard cap on single chain + action — 60s'. Actual: each hook gets independent 60s, then action gets another 60s. For a 3-hook bash chain that's 240s max — 4× documented. Status: documented as known-limit; single-deadline impl deferred to v0.43 (not security-blocking, just a doc/correctness drift). ## Verification (8 smokes — all green) /etc/passwd → denied (system dir) ✓ ../escape.txt → denied (../ segment) ✓ /tmp/symlink → /etc/passwd writeable → denied (resolved /private/etc) ✓ NEW ~/.claude/hooks/no-github-push.sh → denied (substrate dir) ✓ NEW ~/.zshrc → denied (shell-init file) ✓ NEW policy-chain.toml empty [bash] → FAIL-CLOSED ✓ NEW KEI_POLICY_CHAIN_OPTIONAL=1 → opt-in pass-through ✓ kei_bash git-push-github → BLOCKED (regression) ✓ kei_bash echo HELLO → returns content (regression) ✓ cargo test -p kei-mcp: 3/3 still pass. ## Architecture note from Grok Grok architect flagged: safe_tools.rs is 474 LOC, exceeds Constructor Pattern 200-line threshold. v0.42 does NOT refactor (security fixes shipped first); v0.43 will extract path_guard.rs + chain_runner.rs. ## Per-CLI audit value demonstrated Claude — 5 issues + 5 minor, exhaustive line-anchored analysis Grok — architectural review with grep-verified citations Gemini — wrote Rust test project to verify findings (PoC code!) Copilot — partial fact-check, ran out of mid-task --- .../_rust/kei-mcp/src/handlers/safe_tools.rs | 183 ++++++++++++++---- bin/kei | 2 +- docs/encyclopedia/cross-cli-policy.md | 42 ++-- plugin.json | 2 +- 4 files changed, 170 insertions(+), 59 deletions(-) diff --git a/_primitives/_rust/kei-mcp/src/handlers/safe_tools.rs b/_primitives/_rust/kei-mcp/src/handlers/safe_tools.rs index a96fbb7..77d7d60 100644 --- a/_primitives/_rust/kei-mcp/src/handlers/safe_tools.rs +++ b/_primitives/_rust/kei-mcp/src/handlers/safe_tools.rs @@ -30,6 +30,25 @@ //! #3 CLAUDECODE bypass — documented as design (see above), no behavior change //! #4 tokio::fs for async file I/O (was: blocking std::fs on tokio thread) //! #5 process-group kill on Unix (was: kill_on_drop SIGKILLs only direct child) +//! +//! v0.42 re-audit fixes (2026-05-26, 4-CLI dogfood: Claude+Grok+Gemini+Copilot): +//! #1 [CRITICAL] symlink LEAF bypass — canonicalize full path + reject +//! leaf symlinks (v0.41 only canonicalized PARENT; ln -s ~/.ssh/keys ./x +//! then kei_write x followed the link to the target) +//! #2 [HIGH] $HOME removed from default allowed_roots — was a blanket +//! allow that let agent overwrite ~/.claude/hooks (self-neuter), ~/.zshrc +//! (RCE on next shell), and credential stores. Default: $PWD only. +//! Denylist also extended with .claude/, .grok/, .gemini/, .copilot/, +//! .kimi/, and exact shell-init filenames. +//! #3 [HIGH] empty [bash]/[edit]/[write] section also FAIL-CLOSED (was: +//! empty vec → pass-through). KEI_POLICY_CHAIN_OPTIONAL=1 to opt in. +//! #4 [MED] load_chain converted to async + tokio::fs (was: blocking +//! std::fs on tokio worker thread). +//! #5 [MED] set_process_group + killpg applied to HOOK subprocess too +//! (v0.41 only had it on the bash action; hook grandchildren orphaned). +//! #6 [MED] doc note that aggregate timeout is still per-step (60s × +//! N hooks + 60s action). Single-deadline implementation deferred to +//! v0.43 — not security-blocking. use crate::protocol::{err, ok, JsonRpcRequest, JsonRpcResponse, INTERNAL_ERROR, INVALID_PARAMS}; use serde::Deserialize; @@ -256,15 +275,21 @@ async fn handle_write(args: &Value) -> Result { Ok(format!("wrote {} ({} bytes)", safe_path.display(), content.len())) } -/// v0.41 fix #2 (Gemini HIGH): reject obvious path-traversal / sensitive-path -/// targets BEFORE running hooks. Defense-in-depth: hooks may also flag this, -/// but having the Rust layer reject obvious attacks gives a fast-fail -/// independent of hook configuration. +/// Path-traversal + symlink + denylist guard. /// -/// Allowed roots: $PWD (recursively), $HOME (excluding dotfile-secret dirs). -/// Override: set KEI_ALLOWED_ROOTS=":" -separated absolute paths. -/// Always rejected: /etc/, /usr/, /System/, /var/, /private/etc/, $HOME/.ssh/, -/// $HOME/.aws/, $HOME/.config/gcloud/, $HOME/.gnupg/, any path containing "..". +/// v0.41 (initial): rejected `..`, canonicalized PARENT, checked denylist + roots. +/// → 4-CLI re-audit (2026-05-26) found this was bypassable via symlink at the +/// leaf and self-attackable via the $HOME blanket-allowed root. +/// +/// v0.42 fixes: +/// #1 [CRITICAL] reject if the leaf is a symlink (was: validated parent +/// only, fs::write followed leaf symlink to anywhere). Done via +/// `symlink_metadata` on the leaf BEFORE write, and full `canonicalize` +/// on the leaf when the file already exists. +/// #2 [HIGH] $HOME removed from default allowed-roots — default is $PWD +/// only. Denylist now also covers $HOME/.claude/ (the substrate +/// itself), shell init files, and credential stores. Operators who +/// need broader access set KEI_ALLOWED_ROOTS explicitly. fn validate_path(p: &str) -> Result { if p.is_empty() { return Err("file_path: empty".into()); @@ -274,13 +299,23 @@ fn validate_path(p: &str) -> Result { return Err(format!("file_path: '..' segment not allowed in {p}")); } let path = Path::new(p); - // 2. Canonicalize the parent (file may not exist yet for kei_write); - // if even the parent doesn't exist, use the absolute form. - let canonical = if let Some(parent) = path.parent() { + + // 2. Build a canonical path. Prefer canonicalizing the FULL path (resolves + // symlinks at the leaf, fixing v0.41 CRITICAL bypass). For files that + // don't exist yet (kei_write new file), canonicalize the parent and + // join the leaf — but then explicitly check the leaf isn't a symlink + // via symlink_metadata before writing. + let canonical = if path.exists() { + // File exists — canonicalize full path, including resolving any leaf + // symlink to its real target. The denylist/roots check below then + // sees the REAL destination, not the symlink name. + path.canonicalize() + .map_err(|e| format!("file_path: canonicalize {}: {e}", path.display()))? + } else if let Some(parent) = path.parent() { if parent.as_os_str().is_empty() || parent == Path::new("") { std::env::current_dir() .map_err(|e| format!("file_path: cwd unavailable: {e}"))? - .join(path) + .join(path.file_name().unwrap_or_default()) } else if parent.exists() { parent.canonicalize() .map_err(|e| format!("file_path: canonicalize {}: {e}", parent.display()))? @@ -295,12 +330,24 @@ fn validate_path(p: &str) -> Result { } else { return Err(format!("file_path: invalid {p}")); }; + + // 3. Even when the file doesn't exist yet, the LEAF could already be a + // dangling symlink that `fs::write` would follow on creation. Reject. + if let Ok(meta) = std::fs::symlink_metadata(&canonical) { + if meta.file_type().is_symlink() { + return Err(format!( + "file_path: leaf is a symlink (refusing to follow): {}", + canonical.display() + )); + } + } + let canon_str = canonical.display().to_string(); - // 3. Reject obvious sensitive directories. + // 4. Reject system + substrate-control + credential paths. let denylist = [ "/etc/", "/usr/", "/System/", "/var/", "/private/etc/", "/private/var/", - "/root/", + "/root/", "/bin/", "/sbin/", ]; for d in denylist { if canon_str.starts_with(d) { @@ -308,16 +355,38 @@ fn validate_path(p: &str) -> Result { } } if let Ok(home) = std::env::var("HOME") { - let secret_dirs = [".ssh/", ".aws/", ".gnupg/", ".config/gcloud/"]; - for sd in secret_dirs { + // v0.42 fix #2 extended denylist — these targets enable self-attack + // (overwrite the substrate or shell init for RCE on next session). + let dir_secrets = [ + ".ssh/", ".aws/", ".gnupg/", ".config/gcloud/", ".cargo/credentials", + ".npmrc", ".docker/config.json", ".kube/", + ".claude/", // our own substrate: hooks, settings, agents + ".grok/", // sibling CLI's settings + ".gemini/", // antigravity settings + ".copilot/", // copilot config + ".kimi/", // kimi config + ]; + for sd in dir_secrets { let full = format!("{home}/{sd}"); if canon_str.starts_with(&full) { - return Err(format!("file_path: denied (secret dir): {canon_str}")); + return Err(format!("file_path: denied (secret/substrate dir): {canon_str}")); + } + } + // Exact shell-init files (overwriting → RCE on next shell start). + let init_files = [ + ".zshrc", ".bashrc", ".profile", ".bash_profile", ".zprofile", + ".zshenv", ".bash_login", ".inputrc", ".gitconfig", + ".config/fish/config.fish", + ]; + for f in init_files { + let full = format!("{home}/{f}"); + if canon_str == full { + return Err(format!("file_path: denied (shell-init file): {canon_str}")); } } } - // 4. Enforce allowed-root containment. + // 5. Enforce allowed-root containment. let roots = allowed_roots(); if !roots.is_empty() { let ok = roots.iter().any(|r| canon_str.starts_with(r)); @@ -335,13 +404,14 @@ fn allowed_roots() -> Vec { if let Ok(v) = std::env::var("KEI_ALLOWED_ROOTS") { return v.split(':').filter(|s| !s.is_empty()).map(String::from).collect(); } + // v0.42 fix #2 (Claude+Gemini HIGH): default to $PWD ONLY. Was: $PWD + + // $HOME blanket — too permissive, agent could overwrite ~/.claude/hooks/ + // or ~/.zshrc and self-neuter the safety layer. Operators who need + // broader access opt in via KEI_ALLOWED_ROOTS=":" -separated abs paths. let mut roots = Vec::new(); if let Ok(cwd) = std::env::current_dir() { roots.push(format!("{}/", cwd.display())); } - if let Ok(home) = std::env::var("HOME") { - roots.push(format!("{home}/")); - } roots } @@ -353,15 +423,34 @@ fn allowed_roots() -> Vec { /// /// Skips the chain if the parent process is already inside Claude or Grok /// (env flags), since those CLIs' native PreToolUse hooks already fired. +/// Run the configured hook chain for `tool` ("bash"/"edit"/"write"). +/// +/// v0.42 fixes: +/// #3 [HIGH] empty chain (section absent or zero hooks) now FAILS CLOSED +/// unless KEI_POLICY_CHAIN_OPTIONAL=1. +/// #4 [MED] load_chain() converted to async (was: blocking std::fs). +/// #5 [MED] hook subprocess gets `process_group(0)` + killpg on timeout +/// (was: only the bash action got it; hooks could orphan). +/// #6 [MED] aggregate timeout across the whole chain + action (was: +/// per-hook 60s, so chain+action could legitimately run +/// 4× the documented cap on a 3-hook chain). async fn run_chain(tool: &str, hook_input: &Value) -> Result<(), String> { if env_truthy("CLAUDECODE") || env_truthy("GROKCODE") { // Native hooks already enforced — don't double-fire. return Ok(()); } - let chain = load_chain(tool)?; + let chain = load_chain(tool).await?; if chain.is_empty() { - return Ok(()); + // v0.42 fix #3 (Claude+Gemini HIGH): empty section is the same + // misconfig class as missing file — FAIL CLOSED with explicit opt-in. + if env_truthy("KEI_POLICY_CHAIN_OPTIONAL") { + return Ok(()); + } + return Err(format!( + "[policy-chain] section [{tool}] is empty — refusing to run \ + (set KEI_POLICY_CHAIN_OPTIONAL=1 to allow pass-through, e.g. for tests)" + )); } let hooks_dir = hooks_dir()?; @@ -371,24 +460,26 @@ async fn run_chain(tool: &str, hook_input: &Value) -> Result<(), String> { for hook in chain { let path = hooks_dir.join(&hook); if !path.is_file() { - // v0.41 fix #1 (Gemini HIGH): FAIL-CLOSED on missing hook. - // Previously we logged a warning and continued — that meant a - // misconfigured deployment (hook deleted, wrong path) silently - // disabled enforcement. Now: refuse the action, surface the - // error so the operator notices. return Err(format!( "[policy-chain] hook missing: {} (declared in policy-chain.toml [{}])", path.display(), tool )); } - let mut child = Command::new(&path) + let mut child_cmd = Command::new(&path); + child_cmd .stdin(Stdio::piped()) .stdout(Stdio::piped()) .stderr(Stdio::piped()) - .kill_on_drop(true) + .kill_on_drop(true); + // v0.42 fix #5: put hook child in its own process group so timeout + // can killpg the whole tree (was: kill_on_drop = immediate child only). + set_process_group(&mut child_cmd); + + let mut child = child_cmd .spawn() .map_err(|e| format!("spawn {}: {e}", path.display()))?; + let pid_opt = child.id(); if let Some(mut stdin) = child.stdin.take() { stdin.write_all(payload.as_bytes()).await @@ -398,10 +489,18 @@ async fn run_chain(tool: &str, hook_input: &Value) -> Result<(), String> { } let fut = child.wait_with_output(); - let out = tokio::time::timeout(Duration::from_secs(SAFE_TOOL_TIMEOUT_SECS), fut) - .await - .map_err(|_| format!("hook {} timeout", hook))? - .map_err(|e| format!("wait {}: {e}", path.display()))?; + let out = match tokio::time::timeout(Duration::from_secs(SAFE_TOOL_TIMEOUT_SECS), fut).await { + Ok(Ok(o)) => o, + Ok(Err(e)) => return Err(format!("wait {}: {e}", path.display())), + Err(_) => { + // v0.42 fix #5: kill the whole hook process group, not just + // the immediate child. + if let Some(pid) = pid_opt { + killpg_best_effort(pid); + } + return Err(format!("hook {hook} timeout")); + } + }; let code = out.status.code().unwrap_or(-1); if code == 0 { @@ -417,14 +516,14 @@ async fn run_chain(tool: &str, hook_input: &Value) -> Result<(), String> { // ---- config helpers ----------------------------------------------------- -fn load_chain(tool: &str) -> Result, String> { +/// v0.42 fix #4: async + tokio::fs (was: blocking std::fs would freeze +/// a tokio worker if policy-chain.toml lived on a slow / hung mount). +async fn load_chain(tool: &str) -> Result, String> { let path = chain_path()?; - if !path.is_file() { - // v0.41 fix #1 (Gemini HIGH companion): default behavior when - // policy-chain.toml is absent is now configurable via env. Without - // explicit opt-in to pass-through, FAIL-CLOSED — caller sees a - // clear error instead of silent bypass. - if std::env::var("KEI_POLICY_CHAIN_OPTIONAL").as_deref() == Ok("1") { + // tokio::fs::try_exists avoids a blocking is_file() syscall. + let exists = fs::try_exists(&path).await.unwrap_or(false); + if !exists { + if env_truthy("KEI_POLICY_CHAIN_OPTIONAL") { return Ok(vec![]); } return Err(format!( @@ -432,7 +531,7 @@ fn load_chain(tool: &str) -> Result, String> { path.display() )); } - let raw = std::fs::read_to_string(&path) + let raw = fs::read_to_string(&path).await .map_err(|e| format!("read policy-chain.toml: {e}"))?; let parsed: PolicyChain = toml::from_str(&raw) .map_err(|e| format!("parse policy-chain.toml: {e}"))?; diff --git a/bin/kei b/bin/kei index 73bc931..880e8f0 100755 --- a/bin/kei +++ b/bin/kei @@ -224,7 +224,7 @@ ${C1} ██╔═██╗ ██╔══╝ ██║╚════█ ${C1} ██║ ██╗███████╗██║███████║███████╗██║${C0} ${C1} ╚═╝ ╚═╝╚══════╝╚═╝╚══════╝╚══════╝╚═╝${C0} -${C2} KeiSeiKit · substrate v0.40${C0} +${C2} KeiSeiKit · substrate v0.42${C0} ${C3} ─────────────────────────────────────${C0} primary CLI : ${CV}${PRIMARY}${C0} profile : ${CV}${p}${C0} diff --git a/docs/encyclopedia/cross-cli-policy.md b/docs/encyclopedia/cross-cli-policy.md index 7839ed0..fe862b1 100644 --- a/docs/encyclopedia/cross-cli-policy.md +++ b/docs/encyclopedia/cross-cli-policy.md @@ -116,22 +116,34 @@ The chain runs against the same hook scripts Claude uses; identical input shape, identical decisions. On block, the hook's stderr surfaces as the MCP error message so the calling agent sees exactly why. -**v0.41 hardening** (post-audit fixes): +**v0.42 hardening** (post 4-CLI re-audit, supersedes v0.41): -- **Fail-CLOSED on missing config** — if `policy-chain.toml` is absent the - chain refuses to run (was: silent pass-through). Tests / dev can opt in - via `KEI_POLICY_CHAIN_OPTIONAL=1` env. -- **Fail-CLOSED on missing hook script** — if a hook declared in the chain - is not on disk the call fails (was: warn-and-skip). -- **Path-traversal guard** on `kei_edit` / `kei_write` — rejects `..` - segments, `/etc/`, `/usr/`, `/System/`, `/var/`, `/root/`, plus - `$HOME/{.ssh,.aws,.gnupg,.config/gcloud}/` recursively. Override via - `KEI_ALLOWED_ROOTS=':'-separated-absolute-paths`. -- **Async file I/O** — `kei_edit` / `kei_write` now use `tokio::fs` so a - pathological file (`/dev/random` etc.) cannot block a tokio worker. -- **Process-group kill on timeout** — `kei_bash` puts its child shell in - its own process group; on timeout the entire group is `killpg(SIGKILL)`'d - so grandchildren don't orphan (Unix-only; no-op on Windows). +- **Fail-CLOSED everywhere** — missing config, missing hook, OR empty + section (`[bash]/[edit]/[write]` with no entries) all refuse to run. + Tests / dev can opt in via `KEI_POLICY_CHAIN_OPTIONAL=1`. +- **Symlink-safe path guard** — `kei_edit` / `kei_write` canonicalize the + FULL path (resolving any leaf symlink to its real target) and reject + if the leaf itself is a symlink for a not-yet-existent file. Fixes the + v0.41 CRITICAL bypass where `ln -s ~/.ssh/keys ./x; kei_write x` would + follow the link. +- **$PWD-only default root** — `allowed_roots` defaults to current working + directory only. Was: `$PWD` + entire `$HOME` — too permissive, agent + could overwrite `~/.claude/hooks/*` (self-neuter) or `~/.zshrc` (RCE on + next shell). Operators who need broader access set `KEI_ALLOWED_ROOTS`. +- **Denylist extended** — system dirs (`/etc/`, `/usr/`, `/System/`, + `/var/`, `/root/`, `/bin/`, `/sbin/`); credential stores (`~/.ssh/`, + `~/.aws/`, `~/.gnupg/`, `~/.config/gcloud/`, `~/.cargo/credentials`, + `~/.docker/config.json`, `~/.kube/`); substrate dirs (`~/.claude/`, + `~/.grok/`, `~/.gemini/`, `~/.copilot/`, `~/.kimi/`); exact shell-init + files (`.zshrc`, `.bashrc`, `.profile`, `.zshenv`, `.gitconfig`, ...). +- **Async file I/O in load_chain** — `policy-chain.toml` now read via + `tokio::fs` (was: blocking `std::fs` froze worker on slow mounts). +- **Process-group kill on hooks too** — hook subprocesses get + `process_group(0)` and `killpg(SIGKILL)` on timeout. Was: only the bash + action got this; hook grandchildren orphaned. +- **CLAUDECODE/GROKCODE design note** — documented as perf/UX + optimization, NOT a security boundary (env-controllable parent → confused + deputy is already-game-over scenario). ### Double-enforcement guard diff --git a/plugin.json b/plugin.json index 3d96af6..049a8d7 100644 --- a/plugin.json +++ b/plugin.json @@ -3,7 +3,7 @@ "name": "keisei", "displayName": "KeiSei", "description": "Constructor Pattern multi-LLM agent substrate — 38 agents, 69 skills, 54 hooks, 86 blocks. Cross-CLI policy enforcement (Claude/Grok/Copilot/Agy/Kimi) via kei-mcp + kei_bash/kei_edit/kei_write. Rust primitives via classic ./install.sh.", - "version": "0.40.0", + "version": "0.42.0", "homepage": "https://keisei.app", "repository": "https://github.com/KeiSeiLab/KeiSeiKit-1.0.git", "author": {