feat(skills): /vm-provision 6-phase pipeline

Hub-and-spoke skill: - SKILL.md (index) + phase-1-select-provider, phase-2-plan, phase-3-provision, phase-4-harden, phase-5-verify, phase-6-handoff. Pipeline: select provider → Plan Mode doc → provision (hetzner/vultr primitives, SSH first-contact TOFU) → harden-base.sh over SSH → ssh-check + firewall-diff HARD GATE → artefact ledger + optional /web-deploy handoff. Invariants: - ≥ 6 AskUserQuestion calls (Phase 1×2, 2×1, 3×1, 4×1, 5×1). - Hard gate: Phase 6 refuses to run unless ssh-check AND firewall-diff both exit 0. "Ignore and proceed" is BLOCKED by design. - RULE 0.8 (secrets ENV-ref only), RULE 0.4 (cite provider specifics), RULE 0.5 (plan.md written to <run-dir>/plan.md before provisioning), RULE -1 (every failure branch returns 2-3 constructive paths). Defensive-only — no scanning tools, no CVE probes, no third-party attack-surface analysis. Every phase file ≤ 200 LOC per Constructor Pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 21:00:14 +08:00 · 2026-04-21 21:00:14 +08:00 · eee5eecc20
commit eee5eecc20
parent 521659bbfb
7 changed files with 821 additions and 0 deletions
--- a/skills/vm-provision/SKILL.md
+++ b/skills/vm-provision/SKILL.md
@ -0,0 +1,130 @@
+---
+name: vm-provision
+description: End-to-end VPS provisioning — select provider → plan → provision → harden → verify (ssh-check + firewall-diff hard-gate) → handoff. 6 phases, ≥6 AskUserQuestion calls, defensive-only. Stops if either verification primitive fails.
+argument-hint: <optional one-line intent, e.g. "staging api hetzner eu">
+---
+
+# /vm-provision — 6-Phase VPS Pipeline (index)
+
+You turn a short intent ("staging API in EU") into a **hardened, verified
+VPS** ready to host an app. Six phases. Every provider choice, plan detail,
+and fix is surfaced as an `AskUserQuestion` click — no silent defaults.
+
+This `SKILL.md` is the INDEX. Each phase lives in its own file, executed in
+order. Never skip a phase. Never re-order phases.
+
+---
+
+## Pipeline overview
+
+| Phase | File | Purpose | AskUserQuestion |
+|---|---|---|---|
+| 1 | [phase-1-select-provider.md](phase-1-select-provider.md) | Provider + region + plan + ARM/x86 | 2× |
+| 2 | [phase-2-plan.md](phase-2-plan.md) | Plan Mode doc: ports, TLS, admin user | 1× |
+| 3 | [phase-3-provision.md](phase-3-provision.md) | Provision + SSH first contact | 1× |
+| 4 | [phase-4-harden.md](phase-4-harden.md) | Run `harden-base.sh` over SSH | 1× |
+| 5 | [phase-5-verify.md](phase-5-verify.md) | `ssh-check` + `firewall-diff` **HARD GATE** | 1× |
+| 6 | [phase-6-handoff.md](phase-6-handoff.md) | Artifact list + optional `/web-deploy` | — (final report) |
+
+**Minimum AskUserQuestion count across a complete pipeline: 6+** — pure-
+click contract. Only the intent argument and per-port customisations are
+typed.
+
+---
+
+## Hard-Gate Invariant (LOAD-BEARING)
+
+> **No application is deployed onto a VM that has not passed BOTH
+> `ssh-check` (exit 0) and `firewall-diff` (exit 0) in Phase 5.**
+
+Enforced by Phase 5:
+
+- `ssh-check --config /etc/ssh/sshd_config --drop-in /etc/ssh/sshd_config.d` → exit 0.
+- `ufw status numbered | firewall-diff --intent firewall-intent.yaml --stdin` → exit 0.
+- Any non-zero exit → STOP the pipeline; loop back to Phase 4 after the user
+  approves a remediation path.
+
+The verify step is DEFENSIVE ONLY (read + parse). It never scans the host
+for open CVEs or probes third-party endpoints.
+
+---
+
+## Variables the pipeline produces
+
+| Name | Set in | Meaning |
+|---|---|---|
+| `INTENT` | arg | 1-line user description of the target VM |
+| `PROVIDER` | Phase 1 | hetzner / vultr / digitalocean / upcloud / linode |
+| `REGION`   | Phase 1 | provider-specific region code |
+| `PLAN`     | Phase 1 | cx22 / cax11 / vc2-1c-1gb / …  |
+| `ARCH`     | Phase 1 | x86_64 / arm64 |
+| `ADMIN_USER` | Phase 2 | default `keiadmin` |
+| `SSH_PORT`   | Phase 2 | default 22; custom permitted |
+| `APP_PORTS`  | Phase 2 | e.g. `[443/tcp, 80/tcp]` |
+| `TLS_HOST`   | Phase 2 | optional FQDN for Caddy |
+| `VM_IP`    | Phase 3 | IPv4 of the created VM |
+| `VM_NAME`  | Phase 3 | provider resource label |
+| `HARDENED` | Phase 4 | true when harden-base.sh exited 0 |
+| `SSH_CHECK_OK` | Phase 5 | exit 0 of `ssh-check` |
+| `FW_DIFF_OK`   | Phase 5 | exit 0 of `firewall-diff` |
+| `HANDOFF_TO`   | Phase 6 | next skill (e.g. `/web-deploy`) or `none` |
+
+---
+
+## Final report (emit after Phase 6)
+
+```
+=== /VM-PROVISION REPORT ===
+Intent:        <first 80 chars of INTENT>
+Provider:      <PROVIDER> / region=<REGION> / plan=<PLAN> / arch=<ARCH>
+VM:            <VM_NAME> @ <VM_IP>
+Admin:         <ADMIN_USER> (ssh port <SSH_PORT>)
+Ports:         <APP_PORTS>
+TLS:           <TLS_HOST or "none">
+Hardened:      <HARDENED>
+Verification:  ssh-check=<PASS/FAIL> firewall-diff=<PASS/FAIL>
+Handoff:       <HANDOFF_TO>
+Artifacts:     <terraform state path | cloud-init.yaml path>
+```
+
+---
+
+## Rules (enforced at every phase)
+
+- **Pure-click contract.** Only `INTENT` (argument) and custom port values
+  (Phase 2.c) are typed. Every other decision is an `AskUserQuestion`.
+- **Hard gate (Phase 5).** `ssh-check` AND `firewall-diff` must exit 0
+  before Phase 6. Neither can be skipped.
+- **RULE -1 NO DOWNGRADE.** Any phase that fails returns 2-3 constructive
+  paths, never "can't be done".
+- **RULE 0.8 Secrets Single Source.** All provider tokens come from
+  `~/.claude/secrets/.env` (or per-project `secrets/*.env`). NEVER read
+  a token from the conversation, NEVER write one to a file.
+- **RULE 0.4 NO HALLUCINATION.** Provider specifics (prices, region codes,
+  plan IDs) must be fetched at time of use, not recalled. Cite source.
+- **RULE 0.5 Plan Mode First.** Phase 2 writes the plan; no provisioning
+  happens before the user clicks "approve".
+- **Defensive-only.** No scanning tools, no CVE probes, no third-party
+  attack surface analysis. Pure config linting.
+- **Surgical changes.** Harden only the VM being provisioned. Never touch
+  the caller's workstation config.
+- **Constructor Pattern (RULE ZERO).** Each phase file ≤ 200 LOC;
+  generated cloud-init / Caddyfile artefacts never exceed 200 LOC — split
+  into role-specific files if they would.
+
+---
+
+## References
+
+- [phase-1-select-provider.md](phase-1-select-provider.md) · [phase-2-plan.md](phase-2-plan.md) · [phase-3-provision.md](phase-3-provision.md) · [phase-4-harden.md](phase-4-harden.md) · [phase-5-verify.md](phase-5-verify.md) · [phase-6-handoff.md](phase-6-handoff.md)
+- `_blocks/deploy-hetzner-cloud.md` — Hetzner Cloud specifics (Phase 1)
+- `_blocks/deploy-vps-generic.md` — provider-agnostic cloud-init + TF skeleton (Phase 1/3)
+- `_blocks/security-ssh-hardening.md` — sshd drop-in baseline (Phase 4/5)
+- `_blocks/security-firewall-ufw.md` — ufw intent schema (Phase 2/5)
+- `_blocks/security-tls-caddy.md` — TLS (Phase 6 handoff)
+- `_blocks/security-audit-logging.md` — auditd baseline (Phase 4)
+- `_blocks/security-patching.md` — unattended-upgrades (Phase 4)
+- `_primitives/provision-hetzner.sh` · `_primitives/provision-vultr.sh` — provisioners (Phase 3)
+- `_primitives/harden-base.sh` — hardening script (Phase 4)
+- `_primitives/_rust/ssh-check/` · `_primitives/_rust/firewall-diff/` — verify gate (Phase 5)
+- `skills/web-deploy/SKILL.md` — optional Phase 6 handoff
--- a/skills/vm-provision/phase-1-select-provider.md
+++ b/skills/vm-provision/phase-1-select-provider.md
@ -0,0 +1,103 @@
+# Phase 1 — Select Provider + Region + Plan
+
+> Goal: lock `PROVIDER`, `REGION`, `PLAN`, `ARCH` via two AskUserQuestion
+> calls. No provisioning yet — this is pure decision.
+> **Verify criterion:** all four variables set; provider credentials (one
+> env-var name) identified in `~/.claude/secrets/.env`.
+
+---
+
+## 1.a — First AskUserQuestion (4 options max)
+
+**Provider?** (single-select, stored as `PROVIDER`):
+
+- **Hetzner Cloud** — cheapest EU, CX22 x86 / CAX11 ARM64 both €3.79/mo
+  [VERIFIED `_blocks/deploy-hetzner-cloud.md`]. Requires `HCLOUD_TOKEN`.
+- **Vultr** — broad region list, HF compute, $5-10/mo tiers. Requires
+  `VULTR_API_KEY`.
+- **DigitalOcean** — strong US presence, simple API. Requires
+  `DIGITALOCEAN_TOKEN`. Uses `deploy-vps-generic.md` cloud-init.
+- **UpCloud** — preferred for RU-routed workloads (Finnish ASN). Requires
+  `UPCLOUD_USERNAME` + `UPCLOUD_PASSWORD`.
+
+If the intent argument mentions a provider already, pre-select it.
+
+**Credential check BEFORE the click:** read `~/.claude/secrets/.env`; if the
+chosen provider's env var is absent, surface a ONE-line remediation:
+
+> "Provider X needs `<VAR>` in `~/.claude/secrets/.env`. Add it and
+> re-invoke — I don't accept tokens pasted into chat (RULE 0.8)."
+
+Do NOT proceed until the token is in place.
+
+---
+
+## 1.b — Second AskUserQuestion (region + plan + arch, 3 Q's)
+
+Send three questions in one `AskUserQuestion` call. Options are
+provider-specific; generate them from the following matrix (do NOT
+hallucinate codes — re-verify against the provider doc link on each run):
+
+**Region** (stored as `REGION`):
+
+- Hetzner: `fsn1` (Falkenstein DE), `nbg1` (Nürnberg DE), `hel1` (Helsinki
+  FI), `ash` (Ashburn US), `hil` (Hillsboro US), `sin` (Singapore)
+  [VERIFIED https://docs.hetzner.com/cloud/general/locations].
+- Vultr: `ams` (Amsterdam), `fra` (Frankfurt), `ewr` (Newark), `lax`
+  (LA), `nrt` (Tokyo), `sgp` (Singapore).
+- DigitalOcean: `nyc1/2/3`, `sfo3`, `ams3`, `fra1`, `lon1`, `sgp1`.
+- UpCloud: `de-fra1`, `fi-hel1`, `fi-hel2`, `us-nyc1`, `sg-sin1`.
+
+Pick the closest region to the user's stated audience. Prefer the EU when
+the user doesn't specify (lower GDPR exposure).
+
+**Plan** (stored as `PLAN`):
+
+- Hetzner x86: `cx22` (2 vCPU / 4 GB / 40 GB / €3.79/mo), `cx32` (4 vCPU /
+  8 GB / €6.79/mo).
+- Hetzner ARM: `cax11` (2 vCPU / 4 GB / €3.79/mo), `cax21` (4 vCPU / 8 GB
+  / €6.49/mo).
+- Vultr: `vc2-1c-1gb` ($6/mo), `vc2-2c-4gb` ($24/mo), `vhp-1c-2gb-amd`
+  ($14/mo).
+- DigitalOcean: `s-1vcpu-1gb` ($6/mo), `s-2vcpu-2gb` ($18/mo).
+- UpCloud: `1xCPU-1GB`, `2xCPU-2GB`.
+
+Quote only the plans you can verify against the provider's live pricing
+at call-time; do not embed stale pricing as fact.
+
+**Arch** (stored as `ARCH`):
+
+- `x86_64` — default; works with every Debian 12 image.
+- `arm64` — Hetzner `cax*`, AWS Graviton, Oracle Ampere. ~25% cheaper.
+  Rust builds run natively; Node/Python binary wheels may need extra
+  install steps.
+
+---
+
+## 1.c — Verify criterion
+
+Before moving to Phase 2:
+
+- [ ] `PROVIDER`, `REGION`, `PLAN`, `ARCH` all set.
+- [ ] The provider credential env-var EXISTS in `~/.claude/secrets/.env`
+      (we only read the env-var name, never the value).
+- [ ] The user clicked OK, not "back".
+
+Emit one-liner:
+
+`Phase 1 done: PROVIDER=<x> REGION=<y> PLAN=<z> ARCH=<a>. Credentials ref: $<VAR>.`
+
+Proceed to Phase 2.
+
+---
+
+## 1.d — Constructive-fail paths
+
+If the user says "I don't know":
+
+- **(A)** Default to Hetzner CX22 fsn1 x86 (cheapest EU). 1-click.
+- **(B)** Clone an existing project's provider (ask which project,
+  pattern-match from `~/.claude/projects/*/memory/*.md`).
+- **(C)** Defer provisioning — emit a decision memo and exit cleanly.
+
+Never pick silently.
--- a/skills/vm-provision/phase-2-plan.md
+++ b/skills/vm-provision/phase-2-plan.md
@ -0,0 +1,137 @@
+# Phase 2 — Plan Mode Doc
+
+> Goal: produce a written, user-approved plan (RULE 0.5) that enumerates
+> every apt change to the VM before any packet leaves the workstation.
+> **Verify criterion:** user clicked "approve" on the plan; plan artefact
+> exists at `<run-dir>/plan.md`.
+
+---
+
+## 2.a — Synthesise the plan
+
+Write `<run-dir>/plan.md` (where `<run-dir>` is `./.keisei/vm-provision/<timestamp>/`) with
+EXACTLY these sections — no more, no less:
+
+```markdown
+# VM-Provision Plan — <timestamp>
+
+## Intent
+<INTENT one-line>
+
+## Target
+- Provider: <PROVIDER>
+- Region:   <REGION>
+- Plan:     <PLAN> (<arch>)
+- VM name:  kei-<env>-<role>      # derived, ASK if ambiguous
+
+## Access
+- Admin user:  <ADMIN_USER>       # default keiadmin
+- SSH port:    <SSH_PORT>         # default 22
+- SSH pubkey:  <path>             # read from ~/.ssh/id_*.pub
+
+## Ports to allow (ufw + provider cloud firewall)
+<APP_PORTS — list>
+
+## TLS
+- Host:   <TLS_HOST or none>
+- Method: <HTTP-01 | DNS-01 | none>
+
+## Hardening steps (harden-base.sh)
+- apt update + upgrade
+- install: ufw fail2ban unattended-upgrades needrestart auditd audispd-plugins
+- write /etc/ssh/sshd_config.d/99-kei.conf
+- ufw default-deny-in + rate-limit ssh + allow APP_PORTS
+- fail2ban sshd jail
+- auditd baseline ruleset (/etc/audit/rules.d/99-kei.rules)
+- unattended-upgrades (AUTO reboot = FALSE)
+
+## Verification (hard gate before handoff)
+- ssh-check  → exit 0
+- firewall-diff (intent YAML vs live ufw) → exit 0
+
+## Rollback
+- `_primitives/provision-<provider>.sh destroy <VM_NAME>` — 1-command destroy.
+- TF state: <path or "none — CLI-driven">
+
+## Cost estimate
+<Plan price per month from PROVIDER pricing page; CITE>
+```
+
+Cite the source for every price/region/plan detail. Numbers NOT cited =
+NO-GO per RULE 0.4.
+
+---
+
+## 2.b — Build the `firewall-intent.yaml`
+
+Write `<run-dir>/firewall-intent.yaml`:
+
+```yaml
+default:
+  incoming: deny
+  outgoing: allow
+  routed: deny
+rules:
+  - port: <SSH_PORT>
+    proto: tcp
+    action: limit
+    from: any
+    comment: "ssh (rate-limited)"
+  # one entry per APP_PORTS:
+  - port: 443
+    proto: tcp
+    action: allow
+    from: any
+```
+
+This file is the **source of truth** the Phase 5 `firewall-diff` will
+compare against live `ufw status numbered` output. Drift = Phase 5 fail.
+
+---
+
+## 2.c — AskUserQuestion (customise ports, TLS, admin name)
+
+One `AskUserQuestion` call with up to 4 questions:
+
+1. **Admin user?** (stored as `ADMIN_USER`)
+   - `keiadmin` (default)
+   - Custom (user types — only free-text in Phase 2)
+
+2. **SSH port?** (stored as `SSH_PORT`)
+   - `22` (default; simpler)
+   - `2222` (obscurity; not security, but reduces log noise)
+   - Custom
+
+3. **Application ports to open?** (multi-select, stored as `APP_PORTS`)
+   - `443/tcp` — HTTPS (most apps)
+   - `80/tcp`  — HTTP (only if ACME HTTP-01 or redirect)
+   - `none`    — tunneled via Tailscale / private net only
+
+4. **TLS?** (stored as `TLS_HOST` + method)
+   - Caddy HTTP-01 (need 80/tcp + 443/tcp + DNS pointing to VM)
+   - Caddy DNS-01 (no port 80 needed; need DNS provider API token)
+   - None (app provides its own TLS or is behind a proxy)
+
+---
+
+## 2.d — Present the plan for approval
+
+Render `plan.md` in chat. Ask ONE final AskUserQuestion:
+
+**Proceed with this plan?**
+- Approve → Phase 3.
+- Iterate → loop back to 2.c with the user's change request.
+- Abort → emit plan-only artefact and exit (`HANDOFF_TO=none`).
+
+---
+
+## 2.e — Verify criterion
+
+- [ ] `plan.md` written to `<run-dir>/plan.md`.
+- [ ] `firewall-intent.yaml` written to `<run-dir>/firewall-intent.yaml`.
+- [ ] User clicked "Approve".
+
+Emit:
+`Phase 2 done: plan @ <run-dir>/plan.md. <len(APP_PORTS)> ports, TLS=<method>.`
+
+Proceed to Phase 3.
--- a/skills/vm-provision/phase-3-provision.md
+++ b/skills/vm-provision/phase-3-provision.md
@ -0,0 +1,107 @@
+# Phase 3 — Provision + SSH First Contact
+
+> Goal: create the VM via the right `_primitives/provision-<provider>.sh`,
+> wait for `cloud-init` to finish, establish SSH as `ADMIN_USER`.
+> **Verify criterion:** `VM_IP` resolves to a live sshd that accepts the
+> admin key; `cloud-init status --wait` = `done`.
+
+---
+
+## 3.a — Render cloud-init user-data
+
+Copy `_blocks/deploy-vps-generic.md`'s `cloud-init.yaml` template to
+`<run-dir>/cloud-init.yaml`, substituting:
+
+- `${env}`, `${role}` from Phase 2's derived VM name.
+- `${ADMIN_PUBKEY}` — read `~/.ssh/id_ed25519.pub` (or ask Phase 2.c which
+  pubkey). **NEVER** read private keys; pubkeys only.
+
+Render once; do not parameterise further — surgical changes only.
+
+---
+
+## 3.b — Choose provisioner + run
+
+Dispatch by `PROVIDER`:
+
+- `hetzner` → `_primitives/provision-hetzner.sh create <VM_NAME> --type <PLAN> --location <REGION> --user-data <run-dir>/cloud-init.yaml`
+- `vultr`   → `_primitives/provision-vultr.sh   create <VM_NAME> --plan <PLAN> --region <REGION> --user-data <run-dir>/cloud-init.yaml`
+- `digitalocean` / `upcloud` — use each provider's official CLI directly
+  (no wrapper primitive yet); CITE the command in the plan before running.
+
+Both primitives are idempotent — a second invocation with the same name
+prints the existing IP and exits 0. Re-runs after a network blip do NOT
+create duplicates.
+
+Capture stdout (just the IPv4) into `VM_IP`.
+
+---
+
+## 3.c — SSH first contact (TOFU)
+
+```bash
+for i in $(seq 1 60); do
+  ssh -o ConnectTimeout=3 \
+      -o StrictHostKeyChecking=accept-new \
+      -o UserKnownHostsFile=~/.ssh/known_hosts \
+      "${ADMIN_USER}@${VM_IP}" "cloud-init status --wait" && break
+  sleep 5
+done
+```
+
+- `StrictHostKeyChecking=accept-new` is TOFU for the FIRST connect only.
+  After this, subsequent connects use strict mode (default).
+- 60 × 5 s = 5 min timeout; long enough for cloud-init on any of the
+  supported providers.
+- `cloud-init status --wait` blocks until cloud-init finishes — no
+  time-based sleep.
+
+If the loop exhausts without a successful SSH: STOP. Pull provider
+console logs (`hcloud server ssh-log <name>` / vultr console screenshot)
+and surface the failure mode:
+
+- DNS/IP issue → wait + retry (1 constructive path).
+- Wrong pubkey → revoke the VM (`provision-<p>.sh destroy`), fix Phase 2,
+  retry.
+- Cloud-init crashed on first boot → enable rescue mode via provider
+  console, read `/var/log/cloud-init-output.log`, fix template, retry.
+
+---
+
+## 3.d — AskUserQuestion (confirm IP + ready to harden)
+
+One `AskUserQuestion`:
+
+**VM is up at `<VM_IP>`. Cloud-init finished, admin SSH works.**
+- Proceed to hardening (Phase 4).
+- Pause (inspect the VM first; re-invoke skill when ready).
+- Abort + destroy (calls `destroy` on the provisioner, returns to Phase 2).
+
+---
+
+## 3.e — Verify criterion
+
+- [ ] `VM_IP` set.
+- [ ] `cloud-init status` returns `done` (not `error`, not `disabled`).
+- [ ] `ssh ${ADMIN_USER}@${VM_IP} 'true'` exits 0.
+- [ ] `known_hosts` contains the VM's host key (pinned for future connects).
+
+Emit:
+`Phase 3 done: <VM_NAME> up @ <VM_IP>, admin=<ADMIN_USER>, cloud-init=done.`
+
+Proceed to Phase 4.
+
+---
+
+## 3.f — Constructive-fail paths
+
+- **Create returned no IP (provisioner exit 2).** Root cause likely API
+  outage or quota. Paths: (A) retry after 2 min; (B) try sibling region;
+  (C) fall through to an alternate provider (loops back to Phase 1).
+- **cloud-init errored.** Pull logs via rescue; typical causes: bad yaml
+  indentation, unreachable apt mirror. Fix template; re-provision fresh
+  (destroy the broken VM first — partial state = harder to reason about).
+- **SSH never responded.** Check provider firewall / cloud-init user
+  creation — some provider images rename `root` → `debian` and our
+  `keiadmin` sudoers file didn't take. Remediation: add the provider's
+  default user to the admin whitelist for 1 run, then switch.
--- a/skills/vm-provision/phase-4-harden.md
+++ b/skills/vm-provision/phase-4-harden.md
@ -0,0 +1,109 @@
+# Phase 4 — Harden via `harden-base.sh`
+
+> Goal: run `_primitives/harden-base.sh` on the VM, over SSH, idempotently.
+> **Verify criterion:** script exited 0; `systemctl is-active` returns
+> `active` for `ssh`, `ufw`, `fail2ban`, `auditd`.
+
+---
+
+## 4.a — Ship the script
+
+The script lives on the workstation; copy to the VM and run with `sudo`:
+
+```bash
+scp _primitives/harden-base.sh "${ADMIN_USER}@${VM_IP}:/tmp/harden-base.sh"
+ssh "${ADMIN_USER}@${VM_IP}" "sudo bash /tmp/harden-base.sh \
+  --admin-user ${ADMIN_USER} \
+  --ssh-port  ${SSH_PORT} \
+  $(for p in ${APP_PORTS[@]}; do echo --allow-port $p; done)"
+```
+
+Why not `curl … | bash`? Because that depends on a hosted URL AND a
+trusted TLS cert. `scp` the file you already audited locally. Lower
+surface area, reproducible.
+
+The script is **idempotent** — safe to re-run. Re-runs converge the VM to
+the declared state; missing directives get rewritten, extra ones are left
+alone.
+
+---
+
+## 4.b — Stream logs
+
+`harden-base.sh` logs to stderr with timestamps. Capture to
+`<run-dir>/harden.log`:
+
+```bash
+ssh "${ADMIN_USER}@${VM_IP}" "sudo bash /tmp/harden-base.sh …" 2> >(tee <run-dir>/harden.log >&2)
+```
+
+If the script exits non-zero: STOP. Do NOT proceed to Phase 5. Surface
+the last 30 lines of `<run-dir>/harden.log` + ask the user to choose:
+
+- (A) **Fix locally + re-ship** — edit the primitive (if bug is there) or
+  adjust flags. Commit the fix under `checkpoint:` before retry.
+- (B) **Patch the VM manually** — user logs in, fixes, we re-run the
+  script to ensure idempotency.
+- (C) **Destroy + reprovision** — when remediation risk > cost of a
+  fresh VM (2 min on Hetzner).
+
+---
+
+## 4.c — Post-hardening live-check
+
+After exit 0, SSH back in and confirm:
+
+```bash
+ssh "${ADMIN_USER}@${VM_IP}" "
+  set -e
+  systemctl is-active ssh ufw fail2ban auditd unattended-upgrades.service 2>/dev/null || true
+  ufw status | head -20
+  sudo auditctl -l | head -10
+"
+```
+
+All four services must be `active`. `auditctl -l` must show the baseline
+rules (sshd_config, sudoers, identity, module, time). Record the output
+in `<run-dir>/post-harden.txt`.
+
+---
+
+## 4.d — AskUserQuestion (ready to verify?)
+
+One `AskUserQuestion`:
+
+**Hardening applied. Four services active; auditd rules loaded.**
+- Run verification gate (Phase 5).
+- Apply one more pass (typo in `APP_PORTS`, extra user, etc. — loops 4.a
+  with a delta).
+- Pause (leave the VM in current state).
+
+---
+
+## 4.e — Verify criterion
+
+- [ ] `harden-base.sh` exited 0.
+- [ ] `ssh / ufw / fail2ban / auditd` all `active`.
+- [ ] `<run-dir>/harden.log` + `<run-dir>/post-harden.txt` captured.
+
+Emit:
+`Phase 4 done: 4/4 services active. Log: <run-dir>/harden.log.`
+
+Proceed to Phase 5 (hard gate).
+
+---
+
+## 4.f — Non-obvious failure modes
+
+- **`systemctl reload ssh` fails because `sshd -t` rejects the drop-in.**
+  Usually a custom `SSH_PORT` collides with ufw still configured for 22.
+  Fix: ensure ufw rule + sshd Port match BEFORE reload. `harden-base.sh`
+  writes both in one pass, but if an out-of-band edit happened between
+  runs, you get this.
+- **fail2ban service flaps.** Usually a systemd-journal backend mismatch
+  on very old Debian. Verify `backend = systemd` in
+  `/etc/fail2ban/jail.local` (script sets this).
+- **auditd refuses `-e 2`.** Means an earlier rules load is still
+  mastered; `augenrules --load` forces reload. Already in the script.
+
+None of these require a Level-2 escalation — all three have known fixes.
--- a/skills/vm-provision/phase-5-verify.md
+++ b/skills/vm-provision/phase-5-verify.md
@ -0,0 +1,112 @@
+# Phase 5 — Verification Hard Gate (`ssh-check` + `firewall-diff`)
+
+> Goal: fail-closed verification. Phase 6 refuses to run unless BOTH
+> `ssh-check` AND `firewall-diff` exit 0.
+> **Verify criterion:** `SSH_CHECK_OK = true` AND `FW_DIFF_OK = true`.
+
+---
+
+## 5.a — Pull config artefacts from the VM
+
+```bash
+scp "${ADMIN_USER}@${VM_IP}:/etc/ssh/sshd_config"            <run-dir>/sshd_config
+ssh "${ADMIN_USER}@${VM_IP}" "sudo tar -C /etc/ssh -cf - sshd_config.d" \
+  | tar -C <run-dir>/ -xf -
+ssh "${ADMIN_USER}@${VM_IP}" "sudo ufw status numbered"      > <run-dir>/ufw-status.txt
+```
+
+The ufw status requires `sudo` on most distros — the admin user has it
+via `NOPASSWD:ALL` from `harden-base.sh`. If `sudo` requires TTY, prefix
+`sudo -n` and surface the failure.
+
+All captured files are READ ONLY, for `ssh-check` / `firewall-diff` to
+parse. We NEVER push config back from the workstation.
+
+---
+
+## 5.b — Run `ssh-check`
+
+```bash
+_primitives/_rust/ssh-check/target/release/ssh-check \
+  --config  <run-dir>/sshd_config \
+  --drop-in <run-dir>/sshd_config.d \
+  --allow-user "${ADMIN_USER}" \
+  --json > <run-dir>/ssh-check.json
+SSH_EXIT=$?
+```
+
+Exit 0 → `SSH_CHECK_OK=true`. Exit 2 → `SSH_CHECK_OK=false` and
+`<run-dir>/ssh-check.json` lists the violating directives with
+`file:line` precision. Exit 1 → usage/parse error; surface the stderr and
+loop back to Phase 4.
+
+---
+
+## 5.c — Run `firewall-diff`
+
+```bash
+_primitives/_rust/firewall-diff/target/release/firewall-diff \
+  --intent <run-dir>/firewall-intent.yaml \
+  --status-file <run-dir>/ufw-status.txt \
+  --json > <run-dir>/firewall-diff.json
+FW_EXIT=$?
+```
+
+Exit 0 → `FW_DIFF_OK=true`. Exit 2 → the JSON lists `missing` (in intent,
+not live) and `extra` (in live, not intent) rules; `default_mismatches`
+flags a non-deny inbound policy.
+
+---
+
+## 5.d — Decision tree
+
+| `ssh-check` | `firewall-diff` | Action |
+|---|---|---|
+| 0 | 0 | Proceed to Phase 6. |
+| 2 | 0 | Loop to 4.a with the sshd_config.d fix + re-ship `harden-base.sh`. |
+| 0 | 2 | Ask user: apply the `missing`/`extra` deltas via `ufw` commands, or update `firewall-intent.yaml` (the intent was wrong). ONE AskUserQuestion. |
+| 2 | 2 | Both failed — show both JSON reports; recommend a single fresh `harden-base.sh` re-run first (common-mode fix), then re-verify. |
+| 1 | 1 | Workstation issue (missing binary, bad path) — NOT a VM problem. Rebuild the Rust primitives (`cargo build --release` in `_primitives/_rust/`). |
+
+---
+
+## 5.e — The AskUserQuestion
+
+Exactly ONE AskUserQuestion, gated on the decision tree above:
+
+**Verification results:** `ssh-check=<PASS|FAIL>`,
+`firewall-diff=<PASS|FAIL>`. Pick one:
+
+- **Proceed** (only shown when both PASS) → Phase 6.
+- **Fix and retry** → loop to Phase 4 (or to 5.c if intent YAML is wrong).
+- **Ignore and proceed** — **BLOCKED.** The hard-gate invariant refuses
+  this path per `SKILL.md`. You can abort, but you cannot bypass.
+
+---
+
+## 5.f — Verify criterion
+
+- [ ] `ssh-check` exit 0.
+- [ ] `firewall-diff` exit 0.
+- [ ] `<run-dir>/ssh-check.json` and `<run-dir>/firewall-diff.json` saved.
+
+Emit:
+`Phase 5 done: hard-gate PASSED. Artefacts in <run-dir>/.`
+
+Proceed to Phase 6.
+
+---
+
+## 5.g — Non-obvious pitfalls
+
+- **sshd_config.d drop-in not loaded.** Debian 12's default
+  `/etc/ssh/sshd_config` includes the `.d` directory via an `Include`
+  directive. We don't follow `Include` on purpose (security — includes
+  can escape the intended tree). Pass `--drop-in` explicitly.
+- **ufw status shows IPv6 rules as duplicates.** Intent is IPv4-only by
+  default; `firewall-diff`'s normalisation treats `(v6)` rules with same
+  port/proto as "expected" and does not flag them. If you need strict
+  v6-only rules, open a separate intent file.
+- **`MaxAuthTries` at 6 or 10** (Debian default). `harden-base.sh` sets
+  3. If a previous manual edit raised it and we re-ran without rewriting,
+  ssh-check will FAIL `maxauthtries`. Fix: re-run `harden-base.sh`.
--- a/skills/vm-provision/phase-6-handoff.md
+++ b/skills/vm-provision/phase-6-handoff.md
@ -0,0 +1,123 @@
+# Phase 6 — Handoff + Final Report
+
+> Goal: emit a single, complete report and (optionally) hand off to
+> `/web-deploy` or `/auth-setup`. No further mutation to the VM from this
+> skill.
+> **Verify criterion:** final report emitted; all Phase-1..5 artefacts
+> listed with absolute paths; next-skill dispatch (if any) announced.
+
+---
+
+## 6.a — Artefact ledger
+
+Collect and surface:
+
+- `<run-dir>/plan.md`                      — Phase 2
+- `<run-dir>/cloud-init.yaml`              — Phase 3 input
+- `<run-dir>/firewall-intent.yaml`         — Phase 2 source of truth
+- `<run-dir>/harden.log`                   — Phase 4 stderr
+- `<run-dir>/post-harden.txt`              — Phase 4 systemctl snapshot
+- `<run-dir>/sshd_config` + `sshd_config.d/` — Phase 5 input (captured)
+- `<run-dir>/ufw-status.txt`               — Phase 5 input (captured)
+- `<run-dir>/ssh-check.json`               — Phase 5 output
+- `<run-dir>/firewall-diff.json`           — Phase 5 output
+
+Every path must exist on disk before emitting the report. Missing
+artefact = bug in an earlier phase; STOP and surface the gap.
+
+---
+
+## 6.b — Final report
+
+```
+=== /VM-PROVISION REPORT ===
+Intent:        <first 80 chars of INTENT>
+Provider:      <PROVIDER> / region=<REGION> / plan=<PLAN> / arch=<ARCH>
+VM:            <VM_NAME> @ <VM_IP>
+Admin:         <ADMIN_USER> (ssh port <SSH_PORT>)
+Ports:         <APP_PORTS joined>
+TLS:           <TLS_HOST or "none">
+Hardened:      <HARDENED>
+Verification:  ssh-check=PASS  firewall-diff=PASS
+Handoff:       <HANDOFF_TO>
+Artefacts:
+  - <run-dir>/plan.md
+  - <run-dir>/cloud-init.yaml
+  - <run-dir>/firewall-intent.yaml
+  - <run-dir>/harden.log
+  - <run-dir>/post-harden.txt
+  - <run-dir>/sshd_config        (+ sshd_config.d/)
+  - <run-dir>/ufw-status.txt
+  - <run-dir>/ssh-check.json
+  - <run-dir>/firewall-diff.json
+AskUserQuestion count: <N, should be ≥ 6>
+```
+
+No prose after the ledger. The report is the contract.
+
+---
+
+## 6.c — Handoff (no AskUserQuestion; next-skill dispatch inferred)
+
+If `TLS_HOST` was set AND the caller's intent mentions deploying an app
+— dispatch to `/web-deploy` with the VM IP and admin credentials
+(by env-var reference only, RULE 0.8). Surface:
+
+> `Handoff → /web-deploy <VM_IP> --admin <ADMIN_USER> --tls <TLS_HOST>`
+
+If the intent mentions auth / identity — surface:
+
+> `Handoff → /auth-setup <VM_IP>`
+
+Otherwise: `HANDOFF_TO=none`. User invokes the next skill manually when
+ready.
+
+**Never** run the next skill automatically — the user already clicked
+their way through 6 phases; handing off to another multi-phase skill
+without a pause is hostile UX.
+
+---
+
+## 6.d — Memory save (RULE memory-protocol)
+
+Append to `memory/{project-or-infra}.md`:
+
+```markdown
+### VM provisioned: <VM_NAME> (YYYY-MM-DD) [E1]
+- Provider: <PROVIDER> <PLAN> @ <REGION>
+- IP: <VM_IP>
+- Admin: <ADMIN_USER>
+- Hardened: harden-base.sh rev <git-sha>
+- Verify: ssh-check + firewall-diff both PASS
+- Cost: <X>/month (cited @ <date>)
+- Artefacts: <run-dir>/
+```
+
+Evidence grade E1 — facts are direct observations (we ran the commands,
+we have the exit codes, we can re-verify on demand).
+
+If the project file doesn't exist yet, create `memory/{slug}.md` and add
+a single line to `MEMORY.md` under the right section.
+
+---
+
+## 6.e — Verify criterion
+
+- [ ] Report emitted.
+- [ ] All 9+ artefacts exist on disk at absolute paths.
+- [ ] `memory/{project}.md` updated (or created) with the provision entry.
+- [ ] `HANDOFF_TO` announced (or `none`).
+
+---
+
+## 6.f — Rollback instructions (always include in the report)
+
+```
+# destroy the VM + all its resources (idempotent)
+_primitives/provision-<PROVIDER>.sh destroy <VM_NAME> --force
+
+# purge local artefacts (plan, logs, captured configs)
+rm -rf <run-dir>
+```
+
+Keep them visible — Future-Us will appreciate the 1-command path back.