Two resource-exhaustion fixes from Opus Rust + Sonnet Rust audits.
1. kei-cortex per_user_locks DashMap unbounded growth (HIGH)
File: kei-cortex/src/state.rs
Bug: per_user_locks: DashMap<String, Arc<Mutex<()>>> inserted on every
distinct user_id; never evicted. Auth'd attacker with 1M unique user_ids
could OOM the daemon (~150 bytes/entry = 15GB at 100M entries).
Fix: replaced DashMap with tokio::sync::Mutex<LruCache<String,
Arc<TokioMutex<()>>>> capped at PER_USER_LOCK_CAP = 1024. Eviction is
safe because callers hold their own Arc clone for their critical section;
dropping the registry slot retires only the registry's reference. Used
tokio::sync::Mutex for the registry because LruCache::get mutates the
recency list and requires &mut self.
Constructor Pattern: state.rs split into state.rs (184 LOC) +
state_factories.rs (64 LOC, new). Tests added: user_lock_evicts_past_cap
(registry stays ≤1024 after 2048 inserts), user_lock_keeps_most_recent
(LRU recency preserved). Existing user_lock_is_stable_per_user +
user_lock_differs_per_user updated to async — sole call site
(handlers/portrait.rs) gains .await.
2. kei-runtime stdout/stderr cap was post-hoc (HIGH)
File: kei-runtime/src/invoke.rs
Bug: wait_with_output() buffered ALL child stdout/stderr; only cap_bytes
truncated AFTER the child finished. A malicious atom writing 10 GB stdout
(or a buggy one looping infinitely) OOM'd the runtime BEFORE the cap fired.
Fix: replaced wait_with_output() with two reader threads sharing
KillHandle = Arc<Mutex<Option<Child>>>. Each reader appends bytes up to
STREAM_CAP = 16 MiB; on cap exceedance the reader KILLS the child from
inside the reader thread (critical — otherwise the unbounded writer would
never EOF and a post-hoc kill would never fire). Both readers drain the
closing pipe to EOF and return. Truncation surfaces as
InvokeError::SubprocessError with explicit "exceeded N byte cap" message.
Constructor Pattern: invoke.rs decomposed into invoke.rs (159 LOC) +
invoke_io.rs (146 LOC, new) + invoke_error.rs (54 LOC, new). Test added:
invoke_kills_runaway_atom — stages a kei-flood script running cat
/dev/zero, verifies (a) non-zero exit, (b) stdout < 18 MiB, (c)
"cap"/"subprocess" in stderr.
cargo check --workspace: clean. cargo test -p kei-cortex -p kei-runtime
--test-threads=1: 471 pass / 0 fail. Pre-existing openai_loop_wiring.rs
parallel-run flake (state collision when test-threads>1) is unrelated and
unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three independent security hardenings from cross-cutting audits.
1. cortex /term PTY env leak + bind guard (HIGH — Sonnet Cross-cutting + Opus)
- kei-cortex/src/handlers/term_pty.rs: PTY spawn was inheriting daemon's
full process env (KEI_AUTH_KEY, ANTHROPIC_API_KEY, FAL_KEY, etc.) into
every authenticated /term shell. Combined with default cors_origin =
https://keisei.app, one stored XSS on keisei.app + one bearer token =
full local shell with all daemon secrets.
Added apply_safe_env() helper: env_clear() + re-set only HOME, PATH,
USER, LANG, TERM. Spawn helper invokes it before spawn_command.
- kei-cortex/src/main.rs: extracted build_config() helper; added
enforce_loopback_or_local_cors() guard called before serve.bind. Refuses
to start if bind addr is non-loopback AND cors_origin is a public
domain — prevents the XSS-to-shell scenario in production.
2. agent-stub-scan.sh stdin parsing (HIGH — multiple audits)
- hooks/agent-stub-scan.sh: previously read $CLAUDE_AGENT_TRANSCRIPT env
var which Claude Code does NOT set on PostToolUse:Agent. Hook silently
exited 0 — RULE 0.16 enforcement was dead-code in production.
Rewrote to read stdin JSON via jq, flatten .tool_response recursively
(string|array|object via the same pattern as agent-event-done.sh),
guard on .tool_name == "Agent" and command -v jq. Maintained WARN-tier
exit-0 with TODO marker for ENFORCE flip on 2026-05-05 (per RULE 0.16
§2 ladder).
3. magiclink revoke() silent no-op (HIGH — Opus Rust + Sonnet Cross-cutting)
- kei-auth-magiclink/src/{error,provider}.rs: revoke() previously returned
Ok(()) without doing anything. Operators expecting "revoke a session"
semantics from the AuthProvider trait got false success. Stolen magic-
link URLs remained valid until the 15-minute TTL.
Added Error::Unsupported variant. revoke() now returns
Err(Unsupported(...)) with explicit guidance: "rotate KEI_MAGICLINK_HMAC_
KEY to invalidate all live tokens, or maintain a deny-list at the caller
layer". Test provider_revoke_returns_unsupported_error confirms the
error variant is wired.
Tests: cargo check + cargo test both PASS. 444 functional tests across
kei-cortex (428 lib) + kei-auth-magiclink (16 lib + smoke). Pre-existing
openai_loop_wiring.rs 502 failures in routes/openai/{chat,responses}.rs are
NOT introduced by these fixes — separate unrelated triage.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related changes:
1. Author email update across the kit
- All `info@greendragon.info` references replaced with `parfionovich@keilab.io`
- Touched: NOTICE, README.md, _ts_packages/package.json (and 5 adapter packages),
plus 90+ Cargo.toml files
- Apache-2.0 attribution unchanged (Denis Parfionovich, 2026)
2. Cargo workspace.package SSoT for author/license/repository/homepage
- Added to [workspace.package]:
authors = ["Denis Parfionovich <parfionovich@keilab.io>"]
license = "Apache-2.0"
repository = "https://github.com/KeiSei84/KeiSeiKit-1.0"
homepage = "https://github.com/KeiSei84/KeiSeiKit-1.0"
- All ~89 member crates migrated from inline declarations to:
authors.workspace = true
license.workspace = true
(repository/homepage where applicable)
- Closes audit gap: kei-graph-stream, kei-cortex, kei-shared previously had no
license field at the crate level, blocking `cargo publish` on those.
Now they inherit Apache-2.0 from workspace.
- kei-scheduler/Cargo.toml: removed stray duplicate `authors` line introduced
by an earlier migration sweep.
cargo check --workspace: clean. No code changes; metadata-only migration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>