Commit graph

7 commits

Author SHA1 Message Date
Parfii-bot
95ccf56988 fix(kei-cortex/test): serial_test on env-mutating openai tests + wiremock warm-up
Previous wiremock conversion fixed the listener-lifecycle race but
left the underlying problem unsolved: `ensure_env()` mutates the
process-global ANTHROPIC_ENDPOINT, and parallel `cargo test` threads
race on that write. Manifested as 502 / "error sending request for
url …" on the first concurrent test pair under both macOS and Linux.

Annotate every #[tokio::test] in openai_loop_wiring.rs +
openai_compat.rs with `#[serial_test::serial]` — these are the only
tests that touch ANTHROPIC_ENDPOINT via shared_mock_anthropic.
serial_test enforces process-wide ordering so the env mutation +
HTTP request pair is atomic per test. All other tests stay parallel.

Stress: 5 parallel `cargo test` runs all green.
2026-05-12 22:26:42 +08:00
Parfii-bot
1b8b2197fb fix(kei-cortex/test): replace hand-rolled mock with wiremock — closes macOS CI flake
Previous `tests/common/mod.rs` spawned a mock Anthropic upstream via
hand-rolled axum + std:🧵:spawn + own current-thread tokio runtime
bound to 127.0.0.1:0. Stable on Linux runner; flaked on macOS GitHub
Actions runners:
  thread 'streaming_responses_runs_real_loop_not_stub' panicked at
  kei-cortex/tests/openai_loop_wiring.rs:277:5:
  no responses delta event in stream: event: response.error
  data: {"error":"model: anthropic request: error sending request
         for url (http://127.0.0.1:49312/v1/messages)"}

Root cause traced to macOS-runner loopback / fd-limit pressure on the
dedicated-thread current-thread runtime. wiremock crate runs a
production-quality hyper-based mock server, manages its own listener
lifecycle, and survives the macOS runner constraints.

## Change

- `Cargo.toml`: add wiremock = workspace dev-dep (already 0.6 in workspace)
- `tests/common/mod.rs::MockAnthropicServer` rebuilt over wiremock::MockServer
- `build_mock(text)` mounts `POST /v1/messages → 200 + canned body` on a
  wiremock instance
- `mock_anthropic_responding_with()` spins one per call on a parked
  helper thread (preserves `MockAnthropicServer: 'static` lifetime for
  `shared_mock_anthropic` `OnceLock` singleton)
- `shared_mock_anthropic()` API unchanged; existing test sites in
  `tests/openai_loop_wiring.rs` + `tests/openai_compat.rs` continue to
  work without modification

## Verification

`cargo test -p kei-cortex --test openai_loop_wiring`: 7/7 pass locally
`cargo test -p kei-cortex`: full suite green (428 lib + integration)

Also includes DNA-INDEX regenerate (auto-encyclopedia hook artefact;
0 vortex matches preserved).
2026-05-12 21:17:58 +08:00
Parfii-bot
e00d25bca6 fix(perf): bound per-user lock LRU + stream-cap atom subprocess output
Two resource-exhaustion fixes from Opus Rust + Sonnet Rust audits.

1. kei-cortex per_user_locks DashMap unbounded growth (HIGH)
   File: kei-cortex/src/state.rs
   Bug: per_user_locks: DashMap<String, Arc<Mutex<()>>> inserted on every
   distinct user_id; never evicted. Auth'd attacker with 1M unique user_ids
   could OOM the daemon (~150 bytes/entry = 15GB at 100M entries).

   Fix: replaced DashMap with tokio::sync::Mutex<LruCache<String,
   Arc<TokioMutex<()>>>> capped at PER_USER_LOCK_CAP = 1024. Eviction is
   safe because callers hold their own Arc clone for their critical section;
   dropping the registry slot retires only the registry's reference. Used
   tokio::sync::Mutex for the registry because LruCache::get mutates the
   recency list and requires &mut self.

   Constructor Pattern: state.rs split into state.rs (184 LOC) +
   state_factories.rs (64 LOC, new). Tests added: user_lock_evicts_past_cap
   (registry stays ≤1024 after 2048 inserts), user_lock_keeps_most_recent
   (LRU recency preserved). Existing user_lock_is_stable_per_user +
   user_lock_differs_per_user updated to async — sole call site
   (handlers/portrait.rs) gains .await.

2. kei-runtime stdout/stderr cap was post-hoc (HIGH)
   File: kei-runtime/src/invoke.rs
   Bug: wait_with_output() buffered ALL child stdout/stderr; only cap_bytes
   truncated AFTER the child finished. A malicious atom writing 10 GB stdout
   (or a buggy one looping infinitely) OOM'd the runtime BEFORE the cap fired.

   Fix: replaced wait_with_output() with two reader threads sharing
   KillHandle = Arc<Mutex<Option<Child>>>. Each reader appends bytes up to
   STREAM_CAP = 16 MiB; on cap exceedance the reader KILLS the child from
   inside the reader thread (critical — otherwise the unbounded writer would
   never EOF and a post-hoc kill would never fire). Both readers drain the
   closing pipe to EOF and return. Truncation surfaces as
   InvokeError::SubprocessError with explicit "exceeded N byte cap" message.

   Constructor Pattern: invoke.rs decomposed into invoke.rs (159 LOC) +
   invoke_io.rs (146 LOC, new) + invoke_error.rs (54 LOC, new). Test added:
   invoke_kills_runaway_atom — stages a kei-flood script running cat
   /dev/zero, verifies (a) non-zero exit, (b) stdout < 18 MiB, (c)
   "cap"/"subprocess" in stderr.

cargo check --workspace: clean. cargo test -p kei-cortex -p kei-runtime
--test-threads=1: 471 pass / 0 fail. Pre-existing openai_loop_wiring.rs
parallel-run flake (state collision when test-threads>1) is unrelated and
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 15:39:50 +08:00
Parfii-bot
c0d900a943 fix(security): cortex /term env_clear + bind guard, agent-stub-scan stdin, magiclink revoke
Three independent security hardenings from cross-cutting audits.

1. cortex /term PTY env leak + bind guard (HIGH — Sonnet Cross-cutting + Opus)
   - kei-cortex/src/handlers/term_pty.rs: PTY spawn was inheriting daemon's
     full process env (KEI_AUTH_KEY, ANTHROPIC_API_KEY, FAL_KEY, etc.) into
     every authenticated /term shell. Combined with default cors_origin =
     https://keisei.app, one stored XSS on keisei.app + one bearer token =
     full local shell with all daemon secrets.
     Added apply_safe_env() helper: env_clear() + re-set only HOME, PATH,
     USER, LANG, TERM. Spawn helper invokes it before spawn_command.
   - kei-cortex/src/main.rs: extracted build_config() helper; added
     enforce_loopback_or_local_cors() guard called before serve.bind. Refuses
     to start if bind addr is non-loopback AND cors_origin is a public
     domain — prevents the XSS-to-shell scenario in production.

2. agent-stub-scan.sh stdin parsing (HIGH — multiple audits)
   - hooks/agent-stub-scan.sh: previously read $CLAUDE_AGENT_TRANSCRIPT env
     var which Claude Code does NOT set on PostToolUse:Agent. Hook silently
     exited 0 — RULE 0.16 enforcement was dead-code in production.
     Rewrote to read stdin JSON via jq, flatten .tool_response recursively
     (string|array|object via the same pattern as agent-event-done.sh),
     guard on .tool_name == "Agent" and command -v jq. Maintained WARN-tier
     exit-0 with TODO marker for ENFORCE flip on 2026-05-05 (per RULE 0.16
     §2 ladder).

3. magiclink revoke() silent no-op (HIGH — Opus Rust + Sonnet Cross-cutting)
   - kei-auth-magiclink/src/{error,provider}.rs: revoke() previously returned
     Ok(()) without doing anything. Operators expecting "revoke a session"
     semantics from the AuthProvider trait got false success. Stolen magic-
     link URLs remained valid until the 15-minute TTL.
     Added Error::Unsupported variant. revoke() now returns
     Err(Unsupported(...)) with explicit guidance: "rotate KEI_MAGICLINK_HMAC_
     KEY to invalidate all live tokens, or maintain a deny-list at the caller
     layer". Test provider_revoke_returns_unsupported_error confirms the
     error variant is wired.

Tests: cargo check + cargo test both PASS. 444 functional tests across
kei-cortex (428 lib) + kei-auth-magiclink (16 lib + smoke). Pre-existing
openai_loop_wiring.rs 502 failures in routes/openai/{chat,responses}.rs are
NOT introduced by these fixes — separate unrelated triage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 15:38:23 +08:00
Parfii-bot
fc03c98408 chore: author email + Cargo metadata SSoT (parfionovich@keilab.io)
Two related changes:

1. Author email update across the kit
   - All `info@greendragon.info` references replaced with `parfionovich@keilab.io`
   - Touched: NOTICE, README.md, _ts_packages/package.json (and 5 adapter packages),
     plus 90+ Cargo.toml files
   - Apache-2.0 attribution unchanged (Denis Parfionovich, 2026)

2. Cargo workspace.package SSoT for author/license/repository/homepage
   - Added to [workspace.package]:
     authors    = ["Denis Parfionovich <parfionovich@keilab.io>"]
     license    = "Apache-2.0"
     repository = "https://github.com/KeiSei84/KeiSeiKit-1.0"
     homepage   = "https://github.com/KeiSei84/KeiSeiKit-1.0"
   - All ~89 member crates migrated from inline declarations to:
     authors.workspace    = true
     license.workspace    = true
     (repository/homepage where applicable)
   - Closes audit gap: kei-graph-stream, kei-cortex, kei-shared previously had no
     license field at the crate level, blocking `cargo publish` on those.
     Now they inherit Apache-2.0 from workspace.
   - kei-scheduler/Cargo.toml: removed stray duplicate `authors` line introduced
     by an earlier migration sweep.

cargo check --workspace: clean. No code changes; metadata-only migration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 13:55:28 +08:00
Parfii-bot
03d57c7395 fix(kei-cortex): SSRF + atomic token + body limits + capped reads
Group C — kei-cortex daemon security hardening (post-audit 2026-05-02).

- fal_ssrf.rs (new): validate_fal_url whitelist (fal.ai/.media/.run only).
                      Applied to upload_url, file_url, status_url, images[0].url,
                      and download_image. Closes SSRF where compromised fal response
                      could direct daemon to fetch IMDSv1 (169.254.169.254) and
                      stream cloud creds.
- fal_pipeline.rs (new): HTTP step functions extracted from fal.rs; fal.rs trimmed
                          to thin orchestrator (101 LOC, was over 200 LOC limit).
- auth.rs: save_token now writes to <path>.<nanos>.tmp + sync_all + rename. Was
            non-atomic OpenOptions truncate+write — crash mid-write produced empty
            token file -> bootstrap rotated -> stale clients locked out.
- routes.rs + routes_auth.rs (new): explicit DefaultBodyLimit per route — chat 256 KiB,
                                     tool/apply 11 MiB, pet/interaction 64 KiB, tts 32 KiB.
                                     Bearer auth middleware extracted to routes_auth.
- handlers/chat.rs: validate_body enforces MAX_MESSAGE_CHARS = 50_000. Closed cost
                     amplification where 1.99 MiB chat message billed 500K tokens
                     ($1.50/turn at Sonnet pricing) on every send.
- anthropic_sse.rs: SseParser MAX_BUF = 1 MiB cap; was unbounded — peer streaming
                     1 GB without \\n\\n would OOM daemon.
- http_helpers.rs (new): HTTP_CLIENT: Lazy<reqwest::Client> shared across handlers
                          (was per-request Client::new() => 100-300ms TLS handshake
                          per chat turn, no HTTP/2 multiplexing, fd leak risk on
                          macOS TIME_WAIT).
- http_helpers.rs::read_capped: per-response body cap (16 KiB error / 64 MiB success).
                                  Applied to anthropic, anthropic_invoker, elevenlabs,
                                  fal_pipeline. Closed unbounded resp.text() / .bytes()
                                  pattern that compromised upstream could exploit.

Test results: 462 passed; 0 failed (single-threaded). cargo check clean.
2 pre-existing port-binding flakes in openai_loop_wiring tests are unrelated.

Findings consensus: fal SSRF + body-size + bearer-token-atomicity appeared in
Wave-A retest; chat message cap + SSE buf cap appeared in Wave-A only. Would have
been missed by single audit pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 21:39:57 +08:00
Parfii-bot
a4e667de10 KeiSeiKit-public — clean state
Single-commit clean baseline after security scrub of niche-tells,
project codenames, internal jargon, and contributor-email leaks.

Contents:
- 100 Rust crates (_primitives/_rust/)
- 37 agent manifests (_manifests/) + generated specs (_generated/)
- 67 user-invocable skills (skills/)
- 33 hooks (hooks/)
- Composition blocks (_blocks/)
- Documentation (docs/, README.md)
- TS adapter packages (_ts_packages/)
- Assembler (_assembler/)
- Roles (_roles/)
- Templates (_templates/)
- Forgejo CI (.forgejo/)

Author: Denis Parfionovich <info@greendragon.info>

License: see LICENSE.
2026-05-01 12:09:03 +08:00