diff --git a/skills/vm-provision/SKILL.md b/skills/vm-provision/SKILL.md new file mode 100644 index 0000000..80cda54 --- /dev/null +++ b/skills/vm-provision/SKILL.md @@ -0,0 +1,130 @@ +--- +name: vm-provision +description: End-to-end VPS provisioning — select provider → plan → provision → harden → verify (ssh-check + firewall-diff hard-gate) → handoff. 6 phases, ≥6 AskUserQuestion calls, defensive-only. Stops if either verification primitive fails. +argument-hint: +--- + +# /vm-provision — 6-Phase VPS Pipeline (index) + +You turn a short intent ("staging API in EU") into a **hardened, verified +VPS** ready to host an app. Six phases. Every provider choice, plan detail, +and fix is surfaced as an `AskUserQuestion` click — no silent defaults. + +This `SKILL.md` is the INDEX. Each phase lives in its own file, executed in +order. Never skip a phase. Never re-order phases. + +--- + +## Pipeline overview + +| Phase | File | Purpose | AskUserQuestion | +|---|---|---|---| +| 1 | [phase-1-select-provider.md](phase-1-select-provider.md) | Provider + region + plan + ARM/x86 | 2× | +| 2 | [phase-2-plan.md](phase-2-plan.md) | Plan Mode doc: ports, TLS, admin user | 1× | +| 3 | [phase-3-provision.md](phase-3-provision.md) | Provision + SSH first contact | 1× | +| 4 | [phase-4-harden.md](phase-4-harden.md) | Run `harden-base.sh` over SSH | 1× | +| 5 | [phase-5-verify.md](phase-5-verify.md) | `ssh-check` + `firewall-diff` **HARD GATE** | 1× | +| 6 | [phase-6-handoff.md](phase-6-handoff.md) | Artifact list + optional `/web-deploy` | — (final report) | + +**Minimum AskUserQuestion count across a complete pipeline: 6+** — pure- +click contract. Only the intent argument and per-port customisations are +typed. + +--- + +## Hard-Gate Invariant (LOAD-BEARING) + +> **No application is deployed onto a VM that has not passed BOTH +> `ssh-check` (exit 0) and `firewall-diff` (exit 0) in Phase 5.** + +Enforced by Phase 5: + +- `ssh-check --config /etc/ssh/sshd_config --drop-in /etc/ssh/sshd_config.d` → exit 0. +- `ufw status numbered | firewall-diff --intent firewall-intent.yaml --stdin` → exit 0. +- Any non-zero exit → STOP the pipeline; loop back to Phase 4 after the user + approves a remediation path. + +The verify step is DEFENSIVE ONLY (read + parse). It never scans the host +for open CVEs or probes third-party endpoints. + +--- + +## Variables the pipeline produces + +| Name | Set in | Meaning | +|---|---|---| +| `INTENT` | arg | 1-line user description of the target VM | +| `PROVIDER` | Phase 1 | hetzner / vultr / digitalocean / upcloud / linode | +| `REGION` | Phase 1 | provider-specific region code | +| `PLAN` | Phase 1 | cx22 / cax11 / vc2-1c-1gb / … | +| `ARCH` | Phase 1 | x86_64 / arm64 | +| `ADMIN_USER` | Phase 2 | default `keiadmin` | +| `SSH_PORT` | Phase 2 | default 22; custom permitted | +| `APP_PORTS` | Phase 2 | e.g. `[443/tcp, 80/tcp]` | +| `TLS_HOST` | Phase 2 | optional FQDN for Caddy | +| `VM_IP` | Phase 3 | IPv4 of the created VM | +| `VM_NAME` | Phase 3 | provider resource label | +| `HARDENED` | Phase 4 | true when harden-base.sh exited 0 | +| `SSH_CHECK_OK` | Phase 5 | exit 0 of `ssh-check` | +| `FW_DIFF_OK` | Phase 5 | exit 0 of `firewall-diff` | +| `HANDOFF_TO` | Phase 6 | next skill (e.g. `/web-deploy`) or `none` | + +--- + +## Final report (emit after Phase 6) + +``` +=== /VM-PROVISION REPORT === +Intent: +Provider: / region= / plan= / arch= +VM: @ +Admin: (ssh port ) +Ports: +TLS: +Hardened: +Verification: ssh-check= firewall-diff= +Handoff: +Artifacts: +``` + +--- + +## Rules (enforced at every phase) + +- **Pure-click contract.** Only `INTENT` (argument) and custom port values + (Phase 2.c) are typed. Every other decision is an `AskUserQuestion`. +- **Hard gate (Phase 5).** `ssh-check` AND `firewall-diff` must exit 0 + before Phase 6. Neither can be skipped. +- **RULE -1 NO DOWNGRADE.** Any phase that fails returns 2-3 constructive + paths, never "can't be done". +- **RULE 0.8 Secrets Single Source.** All provider tokens come from + `~/.claude/secrets/.env` (or per-project `secrets/*.env`). NEVER read + a token from the conversation, NEVER write one to a file. +- **RULE 0.4 NO HALLUCINATION.** Provider specifics (prices, region codes, + plan IDs) must be fetched at time of use, not recalled. Cite source. +- **RULE 0.5 Plan Mode First.** Phase 2 writes the plan; no provisioning + happens before the user clicks "approve". +- **Defensive-only.** No scanning tools, no CVE probes, no third-party + attack surface analysis. Pure config linting. +- **Surgical changes.** Harden only the VM being provisioned. Never touch + the caller's workstation config. +- **Constructor Pattern (RULE ZERO).** Each phase file ≤ 200 LOC; + generated cloud-init / Caddyfile artefacts never exceed 200 LOC — split + into role-specific files if they would. + +--- + +## References + +- [phase-1-select-provider.md](phase-1-select-provider.md) · [phase-2-plan.md](phase-2-plan.md) · [phase-3-provision.md](phase-3-provision.md) · [phase-4-harden.md](phase-4-harden.md) · [phase-5-verify.md](phase-5-verify.md) · [phase-6-handoff.md](phase-6-handoff.md) +- `_blocks/deploy-hetzner-cloud.md` — Hetzner Cloud specifics (Phase 1) +- `_blocks/deploy-vps-generic.md` — provider-agnostic cloud-init + TF skeleton (Phase 1/3) +- `_blocks/security-ssh-hardening.md` — sshd drop-in baseline (Phase 4/5) +- `_blocks/security-firewall-ufw.md` — ufw intent schema (Phase 2/5) +- `_blocks/security-tls-caddy.md` — TLS (Phase 6 handoff) +- `_blocks/security-audit-logging.md` — auditd baseline (Phase 4) +- `_blocks/security-patching.md` — unattended-upgrades (Phase 4) +- `_primitives/provision-hetzner.sh` · `_primitives/provision-vultr.sh` — provisioners (Phase 3) +- `_primitives/harden-base.sh` — hardening script (Phase 4) +- `_primitives/_rust/ssh-check/` · `_primitives/_rust/firewall-diff/` — verify gate (Phase 5) +- `skills/web-deploy/SKILL.md` — optional Phase 6 handoff diff --git a/skills/vm-provision/phase-1-select-provider.md b/skills/vm-provision/phase-1-select-provider.md new file mode 100644 index 0000000..1e8f23b --- /dev/null +++ b/skills/vm-provision/phase-1-select-provider.md @@ -0,0 +1,103 @@ +# Phase 1 — Select Provider + Region + Plan + +> Goal: lock `PROVIDER`, `REGION`, `PLAN`, `ARCH` via two AskUserQuestion +> calls. No provisioning yet — this is pure decision. +> **Verify criterion:** all four variables set; provider credentials (one +> env-var name) identified in `~/.claude/secrets/.env`. + +--- + +## 1.a — First AskUserQuestion (4 options max) + +**Provider?** (single-select, stored as `PROVIDER`): + +- **Hetzner Cloud** — cheapest EU, CX22 x86 / CAX11 ARM64 both €3.79/mo + [VERIFIED `_blocks/deploy-hetzner-cloud.md`]. Requires `HCLOUD_TOKEN`. +- **Vultr** — broad region list, HF compute, $5-10/mo tiers. Requires + `VULTR_API_KEY`. +- **DigitalOcean** — strong US presence, simple API. Requires + `DIGITALOCEAN_TOKEN`. Uses `deploy-vps-generic.md` cloud-init. +- **UpCloud** — preferred for RU-routed workloads (Finnish ASN). Requires + `UPCLOUD_USERNAME` + `UPCLOUD_PASSWORD`. + +If the intent argument mentions a provider already, pre-select it. + +**Credential check BEFORE the click:** read `~/.claude/secrets/.env`; if the +chosen provider's env var is absent, surface a ONE-line remediation: + +> "Provider X needs `` in `~/.claude/secrets/.env`. Add it and +> re-invoke — I don't accept tokens pasted into chat (RULE 0.8)." + +Do NOT proceed until the token is in place. + +--- + +## 1.b — Second AskUserQuestion (region + plan + arch, 3 Q's) + +Send three questions in one `AskUserQuestion` call. Options are +provider-specific; generate them from the following matrix (do NOT +hallucinate codes — re-verify against the provider doc link on each run): + +**Region** (stored as `REGION`): + +- Hetzner: `fsn1` (Falkenstein DE), `nbg1` (Nürnberg DE), `hel1` (Helsinki + FI), `ash` (Ashburn US), `hil` (Hillsboro US), `sin` (Singapore) + [VERIFIED https://docs.hetzner.com/cloud/general/locations]. +- Vultr: `ams` (Amsterdam), `fra` (Frankfurt), `ewr` (Newark), `lax` + (LA), `nrt` (Tokyo), `sgp` (Singapore). +- DigitalOcean: `nyc1/2/3`, `sfo3`, `ams3`, `fra1`, `lon1`, `sgp1`. +- UpCloud: `de-fra1`, `fi-hel1`, `fi-hel2`, `us-nyc1`, `sg-sin1`. + +Pick the closest region to the user's stated audience. Prefer the EU when +the user doesn't specify (lower GDPR exposure). + +**Plan** (stored as `PLAN`): + +- Hetzner x86: `cx22` (2 vCPU / 4 GB / 40 GB / €3.79/mo), `cx32` (4 vCPU / + 8 GB / €6.79/mo). +- Hetzner ARM: `cax11` (2 vCPU / 4 GB / €3.79/mo), `cax21` (4 vCPU / 8 GB + / €6.49/mo). +- Vultr: `vc2-1c-1gb` ($6/mo), `vc2-2c-4gb` ($24/mo), `vhp-1c-2gb-amd` + ($14/mo). +- DigitalOcean: `s-1vcpu-1gb` ($6/mo), `s-2vcpu-2gb` ($18/mo). +- UpCloud: `1xCPU-1GB`, `2xCPU-2GB`. + +Quote only the plans you can verify against the provider's live pricing +at call-time; do not embed stale pricing as fact. + +**Arch** (stored as `ARCH`): + +- `x86_64` — default; works with every Debian 12 image. +- `arm64` — Hetzner `cax*`, AWS Graviton, Oracle Ampere. ~25% cheaper. + Rust builds run natively; Node/Python binary wheels may need extra + install steps. + +--- + +## 1.c — Verify criterion + +Before moving to Phase 2: + +- [ ] `PROVIDER`, `REGION`, `PLAN`, `ARCH` all set. +- [ ] The provider credential env-var EXISTS in `~/.claude/secrets/.env` + (we only read the env-var name, never the value). +- [ ] The user clicked OK, not "back". + +Emit one-liner: + +`Phase 1 done: PROVIDER= REGION= PLAN= ARCH=. Credentials ref: $.` + +Proceed to Phase 2. + +--- + +## 1.d — Constructive-fail paths + +If the user says "I don't know": + +- **(A)** Default to Hetzner CX22 fsn1 x86 (cheapest EU). 1-click. +- **(B)** Clone an existing project's provider (ask which project, + pattern-match from `~/.claude/projects/*/memory/*.md`). +- **(C)** Defer provisioning — emit a decision memo and exit cleanly. + +Never pick silently. diff --git a/skills/vm-provision/phase-2-plan.md b/skills/vm-provision/phase-2-plan.md new file mode 100644 index 0000000..d991f3c --- /dev/null +++ b/skills/vm-provision/phase-2-plan.md @@ -0,0 +1,137 @@ +# Phase 2 — Plan Mode Doc + +> Goal: produce a written, user-approved plan (RULE 0.5) that enumerates +> every apt change to the VM before any packet leaves the workstation. +> **Verify criterion:** user clicked "approve" on the plan; plan artefact +> exists at `/plan.md`. + +--- + +## 2.a — Synthesise the plan + +Write `/plan.md` (where `` is `./.keisei/vm-provision//`) with +EXACTLY these sections — no more, no less: + +```markdown +# VM-Provision Plan — + +## Intent + + +## Target +- Provider: +- Region: +- Plan: () +- VM name: kei-- # derived, ASK if ambiguous + +## Access +- Admin user: # default keiadmin +- SSH port: # default 22 +- SSH pubkey: # read from ~/.ssh/id_*.pub + +## Ports to allow (ufw + provider cloud firewall) + + +## TLS +- Host: +- Method: + +## Hardening steps (harden-base.sh) +- apt update + upgrade +- install: ufw fail2ban unattended-upgrades needrestart auditd audispd-plugins +- write /etc/ssh/sshd_config.d/99-kei.conf +- ufw default-deny-in + rate-limit ssh + allow APP_PORTS +- fail2ban sshd jail +- auditd baseline ruleset (/etc/audit/rules.d/99-kei.rules) +- unattended-upgrades (AUTO reboot = FALSE) + +## Verification (hard gate before handoff) +- ssh-check → exit 0 +- firewall-diff (intent YAML vs live ufw) → exit 0 + +## Rollback +- `_primitives/provision-.sh destroy ` — 1-command destroy. +- TF state: + +## Cost estimate + +``` + +Cite the source for every price/region/plan detail. Numbers NOT cited = +NO-GO per RULE 0.4. + +--- + +## 2.b — Build the `firewall-intent.yaml` + +Write `/firewall-intent.yaml`: + +```yaml +default: + incoming: deny + outgoing: allow + routed: deny +rules: + - port: + proto: tcp + action: limit + from: any + comment: "ssh (rate-limited)" + # one entry per APP_PORTS: + - port: 443 + proto: tcp + action: allow + from: any +``` + +This file is the **source of truth** the Phase 5 `firewall-diff` will +compare against live `ufw status numbered` output. Drift = Phase 5 fail. + +--- + +## 2.c — AskUserQuestion (customise ports, TLS, admin name) + +One `AskUserQuestion` call with up to 4 questions: + +1. **Admin user?** (stored as `ADMIN_USER`) + - `keiadmin` (default) + - Custom (user types — only free-text in Phase 2) + +2. **SSH port?** (stored as `SSH_PORT`) + - `22` (default; simpler) + - `2222` (obscurity; not security, but reduces log noise) + - Custom + +3. **Application ports to open?** (multi-select, stored as `APP_PORTS`) + - `443/tcp` — HTTPS (most apps) + - `80/tcp` — HTTP (only if ACME HTTP-01 or redirect) + - `none` — tunneled via Tailscale / private net only + +4. **TLS?** (stored as `TLS_HOST` + method) + - Caddy HTTP-01 (need 80/tcp + 443/tcp + DNS pointing to VM) + - Caddy DNS-01 (no port 80 needed; need DNS provider API token) + - None (app provides its own TLS or is behind a proxy) + +--- + +## 2.d — Present the plan for approval + +Render `plan.md` in chat. Ask ONE final AskUserQuestion: + +**Proceed with this plan?** +- Approve → Phase 3. +- Iterate → loop back to 2.c with the user's change request. +- Abort → emit plan-only artefact and exit (`HANDOFF_TO=none`). + +--- + +## 2.e — Verify criterion + +- [ ] `plan.md` written to `/plan.md`. +- [ ] `firewall-intent.yaml` written to `/firewall-intent.yaml`. +- [ ] User clicked "Approve". + +Emit: +`Phase 2 done: plan @ /plan.md. ports, TLS=.` + +Proceed to Phase 3. diff --git a/skills/vm-provision/phase-3-provision.md b/skills/vm-provision/phase-3-provision.md new file mode 100644 index 0000000..3ac919f --- /dev/null +++ b/skills/vm-provision/phase-3-provision.md @@ -0,0 +1,107 @@ +# Phase 3 — Provision + SSH First Contact + +> Goal: create the VM via the right `_primitives/provision-.sh`, +> wait for `cloud-init` to finish, establish SSH as `ADMIN_USER`. +> **Verify criterion:** `VM_IP` resolves to a live sshd that accepts the +> admin key; `cloud-init status --wait` = `done`. + +--- + +## 3.a — Render cloud-init user-data + +Copy `_blocks/deploy-vps-generic.md`'s `cloud-init.yaml` template to +`/cloud-init.yaml`, substituting: + +- `${env}`, `${role}` from Phase 2's derived VM name. +- `${ADMIN_PUBKEY}` — read `~/.ssh/id_ed25519.pub` (or ask Phase 2.c which + pubkey). **NEVER** read private keys; pubkeys only. + +Render once; do not parameterise further — surgical changes only. + +--- + +## 3.b — Choose provisioner + run + +Dispatch by `PROVIDER`: + +- `hetzner` → `_primitives/provision-hetzner.sh create --type --location --user-data /cloud-init.yaml` +- `vultr` → `_primitives/provision-vultr.sh create --plan --region --user-data /cloud-init.yaml` +- `digitalocean` / `upcloud` — use each provider's official CLI directly + (no wrapper primitive yet); CITE the command in the plan before running. + +Both primitives are idempotent — a second invocation with the same name +prints the existing IP and exits 0. Re-runs after a network blip do NOT +create duplicates. + +Capture stdout (just the IPv4) into `VM_IP`. + +--- + +## 3.c — SSH first contact (TOFU) + +```bash +for i in $(seq 1 60); do + ssh -o ConnectTimeout=3 \ + -o StrictHostKeyChecking=accept-new \ + -o UserKnownHostsFile=~/.ssh/known_hosts \ + "${ADMIN_USER}@${VM_IP}" "cloud-init status --wait" && break + sleep 5 +done +``` + +- `StrictHostKeyChecking=accept-new` is TOFU for the FIRST connect only. + After this, subsequent connects use strict mode (default). +- 60 × 5 s = 5 min timeout; long enough for cloud-init on any of the + supported providers. +- `cloud-init status --wait` blocks until cloud-init finishes — no + time-based sleep. + +If the loop exhausts without a successful SSH: STOP. Pull provider +console logs (`hcloud server ssh-log ` / vultr console screenshot) +and surface the failure mode: + +- DNS/IP issue → wait + retry (1 constructive path). +- Wrong pubkey → revoke the VM (`provision-

.sh destroy`), fix Phase 2, + retry. +- Cloud-init crashed on first boot → enable rescue mode via provider + console, read `/var/log/cloud-init-output.log`, fix template, retry. + +--- + +## 3.d — AskUserQuestion (confirm IP + ready to harden) + +One `AskUserQuestion`: + +**VM is up at ``. Cloud-init finished, admin SSH works.** +- Proceed to hardening (Phase 4). +- Pause (inspect the VM first; re-invoke skill when ready). +- Abort + destroy (calls `destroy` on the provisioner, returns to Phase 2). + +--- + +## 3.e — Verify criterion + +- [ ] `VM_IP` set. +- [ ] `cloud-init status` returns `done` (not `error`, not `disabled`). +- [ ] `ssh ${ADMIN_USER}@${VM_IP} 'true'` exits 0. +- [ ] `known_hosts` contains the VM's host key (pinned for future connects). + +Emit: +`Phase 3 done: up @ , admin=, cloud-init=done.` + +Proceed to Phase 4. + +--- + +## 3.f — Constructive-fail paths + +- **Create returned no IP (provisioner exit 2).** Root cause likely API + outage or quota. Paths: (A) retry after 2 min; (B) try sibling region; + (C) fall through to an alternate provider (loops back to Phase 1). +- **cloud-init errored.** Pull logs via rescue; typical causes: bad yaml + indentation, unreachable apt mirror. Fix template; re-provision fresh + (destroy the broken VM first — partial state = harder to reason about). +- **SSH never responded.** Check provider firewall / cloud-init user + creation — some provider images rename `root` → `debian` and our + `keiadmin` sudoers file didn't take. Remediation: add the provider's + default user to the admin whitelist for 1 run, then switch. diff --git a/skills/vm-provision/phase-4-harden.md b/skills/vm-provision/phase-4-harden.md new file mode 100644 index 0000000..5cb4bef --- /dev/null +++ b/skills/vm-provision/phase-4-harden.md @@ -0,0 +1,109 @@ +# Phase 4 — Harden via `harden-base.sh` + +> Goal: run `_primitives/harden-base.sh` on the VM, over SSH, idempotently. +> **Verify criterion:** script exited 0; `systemctl is-active` returns +> `active` for `ssh`, `ufw`, `fail2ban`, `auditd`. + +--- + +## 4.a — Ship the script + +The script lives on the workstation; copy to the VM and run with `sudo`: + +```bash +scp _primitives/harden-base.sh "${ADMIN_USER}@${VM_IP}:/tmp/harden-base.sh" +ssh "${ADMIN_USER}@${VM_IP}" "sudo bash /tmp/harden-base.sh \ + --admin-user ${ADMIN_USER} \ + --ssh-port ${SSH_PORT} \ + $(for p in ${APP_PORTS[@]}; do echo --allow-port $p; done)" +``` + +Why not `curl … | bash`? Because that depends on a hosted URL AND a +trusted TLS cert. `scp` the file you already audited locally. Lower +surface area, reproducible. + +The script is **idempotent** — safe to re-run. Re-runs converge the VM to +the declared state; missing directives get rewritten, extra ones are left +alone. + +--- + +## 4.b — Stream logs + +`harden-base.sh` logs to stderr with timestamps. Capture to +`/harden.log`: + +```bash +ssh "${ADMIN_USER}@${VM_IP}" "sudo bash /tmp/harden-base.sh …" 2> >(tee /harden.log >&2) +``` + +If the script exits non-zero: STOP. Do NOT proceed to Phase 5. Surface +the last 30 lines of `/harden.log` + ask the user to choose: + +- (A) **Fix locally + re-ship** — edit the primitive (if bug is there) or + adjust flags. Commit the fix under `checkpoint:` before retry. +- (B) **Patch the VM manually** — user logs in, fixes, we re-run the + script to ensure idempotency. +- (C) **Destroy + reprovision** — when remediation risk > cost of a + fresh VM (2 min on Hetzner). + +--- + +## 4.c — Post-hardening live-check + +After exit 0, SSH back in and confirm: + +```bash +ssh "${ADMIN_USER}@${VM_IP}" " + set -e + systemctl is-active ssh ufw fail2ban auditd unattended-upgrades.service 2>/dev/null || true + ufw status | head -20 + sudo auditctl -l | head -10 +" +``` + +All four services must be `active`. `auditctl -l` must show the baseline +rules (sshd_config, sudoers, identity, module, time). Record the output +in `/post-harden.txt`. + +--- + +## 4.d — AskUserQuestion (ready to verify?) + +One `AskUserQuestion`: + +**Hardening applied. Four services active; auditd rules loaded.** +- Run verification gate (Phase 5). +- Apply one more pass (typo in `APP_PORTS`, extra user, etc. — loops 4.a + with a delta). +- Pause (leave the VM in current state). + +--- + +## 4.e — Verify criterion + +- [ ] `harden-base.sh` exited 0. +- [ ] `ssh / ufw / fail2ban / auditd` all `active`. +- [ ] `/harden.log` + `/post-harden.txt` captured. + +Emit: +`Phase 4 done: 4/4 services active. Log: /harden.log.` + +Proceed to Phase 5 (hard gate). + +--- + +## 4.f — Non-obvious failure modes + +- **`systemctl reload ssh` fails because `sshd -t` rejects the drop-in.** + Usually a custom `SSH_PORT` collides with ufw still configured for 22. + Fix: ensure ufw rule + sshd Port match BEFORE reload. `harden-base.sh` + writes both in one pass, but if an out-of-band edit happened between + runs, you get this. +- **fail2ban service flaps.** Usually a systemd-journal backend mismatch + on very old Debian. Verify `backend = systemd` in + `/etc/fail2ban/jail.local` (script sets this). +- **auditd refuses `-e 2`.** Means an earlier rules load is still + mastered; `augenrules --load` forces reload. Already in the script. + +None of these require a Level-2 escalation — all three have known fixes. diff --git a/skills/vm-provision/phase-5-verify.md b/skills/vm-provision/phase-5-verify.md new file mode 100644 index 0000000..c228b34 --- /dev/null +++ b/skills/vm-provision/phase-5-verify.md @@ -0,0 +1,112 @@ +# Phase 5 — Verification Hard Gate (`ssh-check` + `firewall-diff`) + +> Goal: fail-closed verification. Phase 6 refuses to run unless BOTH +> `ssh-check` AND `firewall-diff` exit 0. +> **Verify criterion:** `SSH_CHECK_OK = true` AND `FW_DIFF_OK = true`. + +--- + +## 5.a — Pull config artefacts from the VM + +```bash +scp "${ADMIN_USER}@${VM_IP}:/etc/ssh/sshd_config" /sshd_config +ssh "${ADMIN_USER}@${VM_IP}" "sudo tar -C /etc/ssh -cf - sshd_config.d" \ + | tar -C / -xf - +ssh "${ADMIN_USER}@${VM_IP}" "sudo ufw status numbered" > /ufw-status.txt +``` + +The ufw status requires `sudo` on most distros — the admin user has it +via `NOPASSWD:ALL` from `harden-base.sh`. If `sudo` requires TTY, prefix +`sudo -n` and surface the failure. + +All captured files are READ ONLY, for `ssh-check` / `firewall-diff` to +parse. We NEVER push config back from the workstation. + +--- + +## 5.b — Run `ssh-check` + +```bash +_primitives/_rust/ssh-check/target/release/ssh-check \ + --config /sshd_config \ + --drop-in /sshd_config.d \ + --allow-user "${ADMIN_USER}" \ + --json > /ssh-check.json +SSH_EXIT=$? +``` + +Exit 0 → `SSH_CHECK_OK=true`. Exit 2 → `SSH_CHECK_OK=false` and +`/ssh-check.json` lists the violating directives with +`file:line` precision. Exit 1 → usage/parse error; surface the stderr and +loop back to Phase 4. + +--- + +## 5.c — Run `firewall-diff` + +```bash +_primitives/_rust/firewall-diff/target/release/firewall-diff \ + --intent /firewall-intent.yaml \ + --status-file /ufw-status.txt \ + --json > /firewall-diff.json +FW_EXIT=$? +``` + +Exit 0 → `FW_DIFF_OK=true`. Exit 2 → the JSON lists `missing` (in intent, +not live) and `extra` (in live, not intent) rules; `default_mismatches` +flags a non-deny inbound policy. + +--- + +## 5.d — Decision tree + +| `ssh-check` | `firewall-diff` | Action | +|---|---|---| +| 0 | 0 | Proceed to Phase 6. | +| 2 | 0 | Loop to 4.a with the sshd_config.d fix + re-ship `harden-base.sh`. | +| 0 | 2 | Ask user: apply the `missing`/`extra` deltas via `ufw` commands, or update `firewall-intent.yaml` (the intent was wrong). ONE AskUserQuestion. | +| 2 | 2 | Both failed — show both JSON reports; recommend a single fresh `harden-base.sh` re-run first (common-mode fix), then re-verify. | +| 1 | 1 | Workstation issue (missing binary, bad path) — NOT a VM problem. Rebuild the Rust primitives (`cargo build --release` in `_primitives/_rust/`). | + +--- + +## 5.e — The AskUserQuestion + +Exactly ONE AskUserQuestion, gated on the decision tree above: + +**Verification results:** `ssh-check=`, +`firewall-diff=`. Pick one: + +- **Proceed** (only shown when both PASS) → Phase 6. +- **Fix and retry** → loop to Phase 4 (or to 5.c if intent YAML is wrong). +- **Ignore and proceed** — **BLOCKED.** The hard-gate invariant refuses + this path per `SKILL.md`. You can abort, but you cannot bypass. + +--- + +## 5.f — Verify criterion + +- [ ] `ssh-check` exit 0. +- [ ] `firewall-diff` exit 0. +- [ ] `/ssh-check.json` and `/firewall-diff.json` saved. + +Emit: +`Phase 5 done: hard-gate PASSED. Artefacts in /.` + +Proceed to Phase 6. + +--- + +## 5.g — Non-obvious pitfalls + +- **sshd_config.d drop-in not loaded.** Debian 12's default + `/etc/ssh/sshd_config` includes the `.d` directory via an `Include` + directive. We don't follow `Include` on purpose (security — includes + can escape the intended tree). Pass `--drop-in` explicitly. +- **ufw status shows IPv6 rules as duplicates.** Intent is IPv4-only by + default; `firewall-diff`'s normalisation treats `(v6)` rules with same + port/proto as "expected" and does not flag them. If you need strict + v6-only rules, open a separate intent file. +- **`MaxAuthTries` at 6 or 10** (Debian default). `harden-base.sh` sets + 3. If a previous manual edit raised it and we re-ran without rewriting, + ssh-check will FAIL `maxauthtries`. Fix: re-run `harden-base.sh`. diff --git a/skills/vm-provision/phase-6-handoff.md b/skills/vm-provision/phase-6-handoff.md new file mode 100644 index 0000000..4231304 --- /dev/null +++ b/skills/vm-provision/phase-6-handoff.md @@ -0,0 +1,123 @@ +# Phase 6 — Handoff + Final Report + +> Goal: emit a single, complete report and (optionally) hand off to +> `/web-deploy` or `/auth-setup`. No further mutation to the VM from this +> skill. +> **Verify criterion:** final report emitted; all Phase-1..5 artefacts +> listed with absolute paths; next-skill dispatch (if any) announced. + +--- + +## 6.a — Artefact ledger + +Collect and surface: + +- `/plan.md` — Phase 2 +- `/cloud-init.yaml` — Phase 3 input +- `/firewall-intent.yaml` — Phase 2 source of truth +- `/harden.log` — Phase 4 stderr +- `/post-harden.txt` — Phase 4 systemctl snapshot +- `/sshd_config` + `sshd_config.d/` — Phase 5 input (captured) +- `/ufw-status.txt` — Phase 5 input (captured) +- `/ssh-check.json` — Phase 5 output +- `/firewall-diff.json` — Phase 5 output + +Every path must exist on disk before emitting the report. Missing +artefact = bug in an earlier phase; STOP and surface the gap. + +--- + +## 6.b — Final report + +``` +=== /VM-PROVISION REPORT === +Intent: +Provider: / region= / plan= / arch= +VM: @ +Admin: (ssh port ) +Ports: +TLS: +Hardened: +Verification: ssh-check=PASS firewall-diff=PASS +Handoff: +Artefacts: + - /plan.md + - /cloud-init.yaml + - /firewall-intent.yaml + - /harden.log + - /post-harden.txt + - /sshd_config (+ sshd_config.d/) + - /ufw-status.txt + - /ssh-check.json + - /firewall-diff.json +AskUserQuestion count: +``` + +No prose after the ledger. The report is the contract. + +--- + +## 6.c — Handoff (no AskUserQuestion; next-skill dispatch inferred) + +If `TLS_HOST` was set AND the caller's intent mentions deploying an app +— dispatch to `/web-deploy` with the VM IP and admin credentials +(by env-var reference only, RULE 0.8). Surface: + +> `Handoff → /web-deploy --admin --tls ` + +If the intent mentions auth / identity — surface: + +> `Handoff → /auth-setup ` + +Otherwise: `HANDOFF_TO=none`. User invokes the next skill manually when +ready. + +**Never** run the next skill automatically — the user already clicked +their way through 6 phases; handing off to another multi-phase skill +without a pause is hostile UX. + +--- + +## 6.d — Memory save (RULE memory-protocol) + +Append to `memory/{project-or-infra}.md`: + +```markdown +### VM provisioned: (YYYY-MM-DD) [E1] +- Provider: @ +- IP: +- Admin: +- Hardened: harden-base.sh rev +- Verify: ssh-check + firewall-diff both PASS +- Cost: /month (cited @ ) +- Artefacts: / +``` + +Evidence grade E1 — facts are direct observations (we ran the commands, +we have the exit codes, we can re-verify on demand). + +If the project file doesn't exist yet, create `memory/{slug}.md` and add +a single line to `MEMORY.md` under the right section. + +--- + +## 6.e — Verify criterion + +- [ ] Report emitted. +- [ ] All 9+ artefacts exist on disk at absolute paths. +- [ ] `memory/{project}.md` updated (or created) with the provision entry. +- [ ] `HANDOFF_TO` announced (or `none`). + +--- + +## 6.f — Rollback instructions (always include in the report) + +``` +# destroy the VM + all its resources (idempotent) +_primitives/provision-.sh destroy --force + +# purge local artefacts (plan, logs, captured configs) +rm -rf +``` + +Keep them visible — Future-Us will appreciate the 1-command path back.