diff --git a/_blocks/deploy-hetzner-cloud.md b/_blocks/deploy-hetzner-cloud.md new file mode 100644 index 0000000..00c012f --- /dev/null +++ b/_blocks/deploy-hetzner-cloud.md @@ -0,0 +1,50 @@ +# DEPLOY — Hetzner Cloud (CX22 / CAX11 + TF + Cloud Firewall) + +**Why Hetzner:** cheapest EU VPS with reputable network. CX22 (x86, 2 vCPU / 4 GB / 40 GB) = **€3.79/mo + VAT**; CAX11 (Ampere ARM64, 2 vCPU / 4 GB / 40 GB) = **€3.79/mo + VAT**. Prices verified on [VERIFIED 2026-04-21]. Hourly billing caps at the monthly rate — safe to spin down for tests. + +**Terraform provider:** `hetznercloud/hcloud` (official). Pin version: +```hcl +terraform { + required_providers { + hcloud = { source = "hetznercloud/hcloud", version = "~> 1.49" } + } +} +provider "hcloud" { token = var.hcloud_token } +``` +Token via env: `export HCLOUD_TOKEN=$(grep ^HCLOUD_TOKEN ~/.claude/secrets/.env | cut -d= -f2)`. **NEVER commit the token** (RULE 0.8 — see `domain-has-secrets.md`). + +**Minimal `hcloud_server` resource:** +```hcl +resource "hcloud_server" "node" { + name = "kei-${var.env}-${var.role}" + image = "debian-12" + server_type = var.arch == "arm64" ? "cax11" : "cx22" + location = var.location # fsn1 / nbg1 / hel1 / ash / hil / sin + ssh_keys = [hcloud_ssh_key.admin.id] + user_data = file("${path.module}/cloud-init.yaml") + firewalls { firewall_id = hcloud_firewall.base.id } + labels = { project = "kei", env = var.env } +} +``` +`ssh_keys` is **mandatory** — passing it disables the root password e-mail path. + +**Cloud Firewall (stateful, IN by default DENY):** +```hcl +resource "hcloud_firewall" "base" { + name = "kei-base" + rule { direction = "in" protocol = "tcp" port = "22" source_ips = var.admin_cidrs } + rule { direction = "in" protocol = "icmp" source_ips = ["0.0.0.0/0", "::/0"] } + # Add app ports (443, 80) only when an app is deployed behind the node. +} +``` +Attach to the server via `firewalls { firewall_id = … }`. Cloud Firewall is the FIRST line of defense — it drops traffic before it hits the VM's ufw (see `security-firewall-ufw.md`). Both layers MUST agree. + +**Locations:** `fsn1` (Falkenstein DE), `nbg1` (Nürnberg DE), `hel1` (Helsinki FI), `ash` (Ashburn US), `hil` (Hillsboro US), `sin` (Singapore). Pick region closest to users; ARM64 `cax*` available in EU only [VERIFIED 2026-04-21]. + +**Snapshots + rescue:** `hcloud_snapshot` for golden images; `hcloud server enable-rescue` before SSH lockout recovery. Back up `user_data` and TF state (remote backend: S3-compatible such as R2). + +**Primitives provided by KeiSeiKit:** +- `_primitives/provision-hetzner.sh` — wrapper around `hcloud` CLI, idempotent create/destroy, checks existing server by name first. +- Complement with `_primitives/harden-base.sh` run over SSH after first boot. + +**Forbidden:** hcloud token in `.tf` or `.tfvars` committed to git; Cloud Firewall with port 22 open to `0.0.0.0/0`; creating servers with `keep_disk = false` then snapshotting (destroys data); using Hetzner Storage Boxes for anything needing low latency (they're SFTP-over-WAN). diff --git a/_blocks/deploy-vps-generic.md b/_blocks/deploy-vps-generic.md new file mode 100644 index 0000000..8807f87 --- /dev/null +++ b/_blocks/deploy-vps-generic.md @@ -0,0 +1,79 @@ +# DEPLOY — Generic VPS (provider-agnostic cloud-init + ssh-first-contact) + +**Target providers:** DigitalOcean Droplets, Vultr, UpCloud, Linode/Akamai. Each has slightly different Terraform providers + CLIs, but the Day-0 contract is identical: **boot a Debian/Ubuntu image with a cloud-init user-data blob; add one admin SSH key; nothing else.** + +**Day-0 cloud-init blob (`cloud-init.yaml`) — universal:** +```yaml +#cloud-config +hostname: kei-${env}-${role} +timezone: UTC +package_update: true +package_upgrade: true +packages: + - ufw + - fail2ban + - unattended-upgrades + - auditd + - needrestart + - curl + - jq +users: + - name: keiadmin + groups: sudo + shell: /bin/bash + sudo: ALL=(ALL) NOPASSWD:ALL + ssh_authorized_keys: + - ${ADMIN_PUBKEY} +ssh_pwauth: false +disable_root: true +write_files: + - path: /etc/ssh/sshd_config.d/99-kei.conf + permissions: '0644' + content: | + PasswordAuthentication no + PermitRootLogin no + MaxAuthTries 3 + AllowUsers keiadmin + ClientAliveInterval 120 + ClientAliveCountMax 2 +runcmd: + - [ systemctl, restart, ssh ] + - [ ufw, default, deny, incoming ] + - [ ufw, default, allow, outgoing ] + - [ ufw, allow, 22/tcp ] + - [ ufw, --force, enable ] +``` +The blob is intentionally provider-neutral. Provider-specific bits (private-network bring-up, metadata service quirks) go in a short appendix the provisioner appends. See `_primitives/harden-base.sh` for post-boot hardening re-runs. + +**SSH-first-contact (`ssh-first-contact.sh` pattern):** +```bash +# Wait for cloud-init to finish AND sshd to be ready on the new IP. +for i in $(seq 1 60); do + ssh -o ConnectTimeout=3 -o StrictHostKeyChecking=accept-new \ + "keiadmin@$IP" "cloud-init status --wait" && break + sleep 5 +done +ssh "keiadmin@$IP" "sudo test -f /var/lib/cloud/instance/boot-finished" +``` +`StrictHostKeyChecking=accept-new` is OK only for the FIRST contact (TOFU). Store the fingerprint to `~/.ssh/known_hosts`; subsequent connects use default strict mode. Never use `StrictHostKeyChecking=no` — accepts MitM silently. + +**Terraform skeleton (provider-agnostic via vars):** +```hcl +variable "provider_kind" {} # "digitalocean" | "vultr" | "upcloud" | "linode" +variable "region" {} +variable "size_slug" {} # provider-specific size id +variable "admin_pubkey" {} # raw ssh-ed25519 … +locals { + user_data = templatefile("${path.module}/cloud-init.yaml", { ADMIN_PUBKEY = var.admin_pubkey }) +} +# ... then a module-per-provider resource that all read `local.user_data` +``` +Keep TF state **local per-env-per-dev by default**; upgrade to remote backend (R2, S3, Terraform Cloud) only when ≥ 2 humans share state. + +**Per-provider gotchas (verified 2026-04-21):** +- **DigitalOcean:** Marketplace "Docker" images skip unattended-upgrades — start from plain Debian 12 instead. IPv6 requires `ipv6 = true` on the droplet. +- **Vultr:** `vultr-cli` needs `VULTR_API_KEY`; default firewall is OPEN — attach a firewall group or rely solely on ufw. +- **UpCloud:** IPs rotate on full stop+start unless you request `floating_ip`. Finnish ASN often preferred over Hetzner in RU-routed workloads (see `project-vortex.md`). +- **Linode:** cloud-init runs before disk resize on some plans → `growpart` may need a rerun on first `ssh`. + +**Forbidden:** baking the admin private key into an AMI/snapshot; reusing one SSH keypair across envs; letting cloud-init pull scripts from a mutable URL (`curl … | bash` in `runcmd:` — pin to a hash); running `apt-get dist-upgrade -y` in `runcmd` without `needrestart` to surface pending reboots. diff --git a/_blocks/security-audit-logging.md b/_blocks/security-audit-logging.md new file mode 100644 index 0000000..27009e3 --- /dev/null +++ b/_blocks/security-audit-logging.md @@ -0,0 +1,77 @@ +# SECURITY — Audit Logging (auditd + journald forwarding) + +**Goal:** every privileged action (sudo, ssh login, sensitive file edit) leaves a tamper-evident trail that survives the VM being reimaged. + +**Stack:** +- `auditd` — Linux kernel audit framework, writes to `/var/log/audit/audit.log` in human-unfriendly but machine-parseable K/V format. +- `journald` — systemd's binary journal (`/var/log/journal/`), captures stdout/stderr of every service plus syslog stream. +- **Off-box shipping** (optional but recommended) — forward journald to a remote log collector (Loki, Vector, rsyslog+TLS). Local logs are destroyed on reimage. + +**Install + enable:** +``` +sudo apt install -y auditd audispd-plugins +sudo systemctl enable --now auditd +``` + +**Reference `/etc/audit/rules.d/99-kei.rules`:** +``` +# KeiSeiKit audit baseline — pinned 2026-04-21. Loaded by augenrules on boot. +## 1. SSH events +-w /etc/ssh/sshd_config -p wa -k sshd_config +-w /etc/ssh/sshd_config.d/ -p wa -k sshd_config +-w /root/.ssh/ -p wa -k ssh_keys_root +-w /home/keiadmin/.ssh/ -p wa -k ssh_keys_admin + +## 2. Sudo events +-w /etc/sudoers -p wa -k sudoers +-w /etc/sudoers.d/ -p wa -k sudoers +-a always,exit -F arch=b64 -S execve -F euid=0 -F auid>=1000 -F auid!=unset -k sudo_root + +## 3. Privilege / identity changes +-w /etc/passwd -p wa -k identity +-w /etc/group -p wa -k identity +-w /etc/shadow -p wa -k identity +-w /etc/gshadow -p wa -k identity + +## 4. Loading / unloading kernel modules +-a always,exit -F arch=b64 -S init_module -S finit_module -S delete_module -k module + +## 5. Time changes (detect attempts to skew audit timestamps) +-a always,exit -F arch=b64 -S adjtimex -S settimeofday -S clock_settime -k time +-w /etc/localtime -p wa -k time + +## 6. Make the config itself immutable (place LAST) +-e 2 +``` +`-e 2` locks the ruleset until reboot (tamper-resistant). Load with `sudo augenrules --load && sudo systemctl restart auditd`. Test with `sudo ausearch -k sshd_config | tail`. + +**Human-readable summaries:** `sudo aureport -au` (auth events), `aureport -m` (module loads), `aureport -k` (keyed rule hits). Use these in incident response; raw `audit.log` is only for ingest pipelines. + +**journald tuning — `/etc/systemd/journald.conf.d/99-kei.conf`:** +``` +[Journal] +Storage=persistent +Compress=yes +SystemMaxUse=500M +SystemKeepFree=1G +MaxFileSec=1week +ForwardToSyslog=no +``` +`Storage=persistent` creates `/var/log/journal/` — without it, `journalctl` history disappears on reboot. `MaxFileSec=1week` rotates weekly; combine with off-box shipping so you don't lose events. + +**Off-box shipping patterns:** +- **systemd-journal-upload** — built-in, ships via HTTPS to a `systemd-journal-remote` receiver. Mutual-TLS recommended. +- **Vector** () — pull from `journald` source, push to Loki/S3/syslog-TLS. Modern, Rust-native. Uses `/run/log/journal/` + unix socket. +- **rsyslog → remote** — legacy path; useful if you already operate a syslog collector. + +Any choice: use TLS, authenticate the receiver, do NOT push cleartext logs across the internet. Logs often contain secrets even when the app tries not to log them. + +**Failure-mode handling:** `auditd` can be configured to panic the kernel when the audit queue fills — reasonable for high-compliance, DANGEROUS for general VMs. Default `/etc/audit/auditd.conf` has `disk_full_action = SUSPEND` and `disk_error_action = SUSPEND` — keep these; tune to `HALT` only if regulatory driver requires it. + +**Verification (skill Phase 5):** +- `sudo auditctl -l` returns the non-empty rule list. +- `systemctl is-active auditd` = `active`. +- `journalctl --disk-usage` shows a non-zero persistent journal. +- (Optional) an off-box log-receiver shows entries within the last N minutes. + +**Forbidden:** deleting `/var/log/audit/audit.log` or `/var/log/journal/*` on a live host (breaks chain-of-custody); running auditd with `-e 0` (unlocked, attacker can disable the kernel audit); shipping logs in cleartext; logging secrets (app-level concern — redact before `logger()`); disabling persistent journald. diff --git a/_blocks/security-firewall-ufw.md b/_blocks/security-firewall-ufw.md new file mode 100644 index 0000000..95649de --- /dev/null +++ b/_blocks/security-firewall-ufw.md @@ -0,0 +1,62 @@ +# SECURITY — Firewall (ufw default-deny + rate limiting + nftables alt) + +**Posture — default-deny-in / allow-out:** +``` +ufw default deny incoming +ufw default allow outgoing +ufw default deny routed # do NOT forward unless explicitly routing +ufw limit 22/tcp comment 'ssh (rate-limited: 6 conn / 30s)' +ufw logging medium +ufw --force enable +``` +`ufw limit` = per-source-IP brute-force mitigation at the kernel level (iptables `recent` module). Use for SSH — *never* use it for app traffic (false positives on shared-NAT clients). + +**Layer ordering (read top-down):** +1. **Cloud Firewall** (Hetzner Cloud Firewall / AWS Security Group / DO Firewall) — drops at the provider edge, BEFORE packets hit the VM. Cheapest layer. +2. **ufw** on the VM — defence in depth; also covers provider-firewall misconfigs and private-network paths. +3. **App-level auth** — sshd keys, TLS client certs, app tokens. + +Both the Cloud Firewall AND ufw must agree on the port allow-list. A mismatch means "it works from provider console but not from Tailscale" or vice-versa. Use `_primitives/_rust/firewall-diff/` to compare intended rules (YAML) against running `ufw status`. + +**Intended-rules YAML schema (`firewall-intent.yaml`):** +```yaml +default: + incoming: deny + outgoing: allow + routed: deny +rules: + - port: 22 + proto: tcp + action: limit + from: any + comment: "ssh (rate-limited)" + - port: 443 + proto: tcp + action: allow + from: any + comment: "https / caddy" + - port: 80 + proto: tcp + action: allow + from: any + comment: "http / acme-http-01" +``` +`firewall-diff` round-trips this against live `ufw status numbered` JSON-parse and prints additions/deletions. Exit 0 iff live ≡ intent. + +**Rate limiting patterns:** +- `limit` — built-in; 6 connections / 30 s per IP. Good for SSH. +- Per-app — do it inside the app or a reverse proxy (nginx `limit_req`, Caddy `rate_limit`), not in ufw. Kernel rate-limit doesn't understand HTTP methods. +- ICMP — `ufw default allow outgoing` covers outbound; inbound ICMP should be `allow` (echo) for monitoring, NOT blanket-blocked (blocks path-MTU discovery). + +**IPv6:** `/etc/default/ufw` → `IPV6=yes` (default Debian 12). Verify via `ufw status verbose` shows the (v6) rules. Missing IPv6 rules = a trivial bypass on dual-stack VMs. + +**Logging:** `ufw logging medium` writes to `/var/log/ufw.log`. Forward to journald (default on systemd) or an off-box log collector. Logging `high` is too chatty for steady state; use it only during incident response. + +**nftables alternative (for hosts that have Docker-installed iptables-nft):** +ufw is a thin wrapper over iptables/nftables; on Docker-heavy hosts, Docker's daemon aggressively rewrites iptables and can bypass ufw. Two options: +1. **DOCKER_OPTS=`--iptables=false`** (and do NAT yourself — advanced). +2. **`ufw-docker`** companion (, not bundled in Debian — pin a tagged release, review the script BEFORE install). + +On non-Docker hosts, ufw is sufficient. On Docker hosts, EITHER isolate (dedicated host + Cloud Firewall only) OR use `ufw-docker` — don't half-configure. + +**Forbidden:** `ufw default allow incoming` "temporarily"; `allow from any to any port 22` without `limit`; skipping the IPv6 rule set; letting Docker silently override ufw without disabling its iptables chain; relying on `ufw` as the ONLY layer when a Cloud Firewall is available. diff --git a/_blocks/security-patching.md b/_blocks/security-patching.md new file mode 100644 index 0000000..a0b5a41 --- /dev/null +++ b/_blocks/security-patching.md @@ -0,0 +1,62 @@ +# SECURITY — Patching (unattended-upgrades + needrestart + reboot window) + +**Goal:** security patches applied within 24 h of release, service restarts + kernel reboots happen within a declared maintenance window (NOT ad-hoc at 3 AM UTC on a random Tuesday). + +**Install:** +``` +sudo apt install -y unattended-upgrades needrestart +``` + +**`/etc/apt/apt.conf.d/50unattended-upgrades` (essential lines, Debian 12 / Ubuntu 22.04+):** +``` +Unattended-Upgrade::Origins-Pattern { + "origin=Debian,codename=${distro_codename}-security"; + "origin=Debian,codename=${distro_codename}-updates"; +}; +Unattended-Upgrade::Automatic-Reboot "false"; +Unattended-Upgrade::Automatic-Reboot-Time "04:00"; +Unattended-Upgrade::Mail "admin@example.com"; +Unattended-Upgrade::MailReport "on-change"; +``` +`Automatic-Reboot "false"` is the SAFE default — an automatic reboot without coordination kills in-flight requests. Pair with `needrestart` to SURFACE reboot requirement, then schedule the window explicitly (below). + +**`/etc/apt/apt.conf.d/20auto-upgrades`:** +``` +APT::Periodic::Update-Package-Lists "1"; +APT::Periodic::Unattended-Upgrade "1"; +APT::Periodic::AutocleanInterval "7"; +``` +Triggers daily via `/lib/systemd/system/apt-daily.timer` + `apt-daily-upgrade.timer`. + +**needrestart:** after each upgrade, prints services that loaded old library versions and need restart. `/etc/needrestart/needrestart.conf`: +``` +$nrconf{restart} = 'l'; # list only; do NOT auto-restart services +$nrconf{kernelhints} = -1; # suppress "reboot hint" interactive prompt (non-TTY cron) +``` +`nrconf{restart} = 'a'` (auto) is tempting but dangerous — restarting `postgresql` or a stateful app during a migration = corruption. + +**Reboot window pattern (declared, env-var-driven):** +```bash +# /etc/systemd/system/kei-reboot-window.service + .timer +# Only reboots if /var/run/reboot-required exists AND the current time +# falls inside the declared window. +[Service] +Type=oneshot +EnvironmentFile=/etc/default/kei-reboot-window +ExecStart=/usr/local/bin/kei-reboot-window + +# /etc/default/kei-reboot-window +KEI_REBOOT_DOW="Sun" # day-of-week +KEI_REBOOT_HOUR="04" # 24h, UTC +KEI_REBOOT_MIN="15" +KEI_DRAIN_CMD="" # optional pre-reboot drain (e.g. drain a load-balancer slot) +``` +`kei-reboot-window` script checks `[ -f /var/run/reboot-required ]`, verifies it is the declared DOW/hour, runs `$KEI_DRAIN_CMD`, then `systemctl reboot`. Commit the script once; reuse the env file per-host. + +**Provider-specific:** +- **Hetzner Cloud / Vultr / UpCloud / DigitalOcean / Linode** — nothing extra; cloud-init already installs the packages per `deploy-vps-generic.md`. +- **AWS EC2** — `ec2-instance-connect` may briefly reject SSH during a reboot — tolerate in orchestration retries. + +**Auditability:** `unattended-upgrades` logs to `/var/log/unattended-upgrades/unattended-upgrades.log`. Forward via journald (see `security-audit-logging.md`). Package a short summary in the skill Phase 5 report. + +**Forbidden:** `Unattended-Upgrade::Automatic-Reboot "true"` on stateful services; `$nrconf{restart} = 'a'` on a database host; silently skipping the reboot window to "avoid downtime" (real fix: HA, not skipped patches); installing `.deb` packages from third-party repos without pinning + signature verification; disabling the `apt-daily.timer` — disables ALL security updates. diff --git a/_blocks/security-ssh-hardening.md b/_blocks/security-ssh-hardening.md new file mode 100644 index 0000000..2ad075b --- /dev/null +++ b/_blocks/security-ssh-hardening.md @@ -0,0 +1,51 @@ +# SECURITY — SSH Hardening (sshd_config.d/99-kei.conf) + +**Rule:** hardening goes into a drop-in under `/etc/ssh/sshd_config.d/`, NEVER by editing `/etc/ssh/sshd_config` directly. The main file ships with distro-owned defaults; drop-ins win on later-read order and survive package upgrades cleanly. + +**Reference file `/etc/ssh/sshd_config.d/99-kei.conf`:** +``` +# KeiSeiKit hardened SSH — pinned 2026-04-21, auditable via ssh-check. +Protocol 2 +PasswordAuthentication no +ChallengeResponseAuthentication no +KbdInteractiveAuthentication no +PermitRootLogin prohibit-password +PermitEmptyPasswords no +UsePAM yes +MaxAuthTries 3 +MaxSessions 4 +LoginGraceTime 20 +AllowUsers keiadmin +AllowTcpForwarding no +X11Forwarding no +PermitTunnel no +ClientAliveInterval 120 +ClientAliveCountMax 2 +LogLevel VERBOSE +# Modern crypto only (OpenSSH ≥ 8.9, default Debian 12 / Ubuntu 22.04+): +KexAlgorithms curve25519-sha256,curve25519-sha256@libssh.org,sntrup761x25519-sha512@openssh.com +Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com +MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com +HostKeyAlgorithms ssh-ed25519,rsa-sha2-512,rsa-sha2-256 +``` +Apply with `sshd -t` (config test) before `systemctl reload ssh`. `reload` NOT `restart` — restart kills existing sessions; reload re-reads config while keeping them. + +**Field-by-field rationale:** +- `PasswordAuthentication no` — passwords are the #1 SSH brute-force vector. Keys only. +- `PermitRootLogin prohibit-password` — root only via key, never password. `no` blocks even emergency cloud-console rescue paths on some providers; `prohibit-password` is the pragmatic middle. +- `MaxAuthTries 3` — reduces per-connection key/password attempts; combine with fail2ban for per-IP bans (separate concern). +- `AllowUsers keiadmin` — whitelist is simpler than group-based DENY and audits trivially. Adding users = explicit edit. +- `LogLevel VERBOSE` — logs the key fingerprint used; without it you can't tell which admin logged in after compromise. +- `ClientAliveInterval 120` + `ClientAliveCountMax 2` — idle sessions die in 4 minutes. Lost laptops don't leave open shells. +- `AllowTcpForwarding no` / `PermitTunnel no` — disables SSH-as-VPN. Enable per-use-case via `Match User tunneluser` only. + +**Modern KEX/Cipher/MAC lists (2026-04-21):** +- KEX: `sntrup761x25519-sha512@openssh.com` is post-quantum hybrid (default since OpenSSH 9.9) [VERIFIED https://www.openssh.com/releasenotes.html]; `curve25519-sha256` is the classic ECDH. +- Ciphers: AEAD only (`chacha20-poly1305`, `aes*-gcm`). Dropped CBC-mode — vulnerable to Terrapin CVE-2023-48795 without strict-KEX. +- MACs: ETM (Encrypt-Then-MAC) only. Legacy MAC-Then-Encrypt is dropped. +- HostKey: prefer `ssh-ed25519`; keep `rsa-sha2-*` for older client compatibility. Drop `ssh-rsa` (SHA-1, broken). + +**Verification (KeiSeiKit primitive):** +`_primitives/_rust/ssh-check/` parses BOTH `sshd_config` AND every `sshd_config.d/*.conf` (in filename sort order, last wins per directive), reports violations of the matrix above with `file:line` precision. Run BEFORE every `systemctl reload ssh` and BEFORE the skill phase-5 verify gate. + +**Forbidden:** editing `/etc/ssh/sshd_config` in-place when a drop-in directory exists; `PermitRootLogin yes`; `PasswordAuthentication yes`; accepting any `diffie-hellman-group1-*` / `ssh-rsa` / CBC ciphers; restarting sshd before `sshd -t` passes; relying on fail2ban alone without key-only auth. diff --git a/_blocks/security-tls-caddy.md b/_blocks/security-tls-caddy.md new file mode 100644 index 0000000..c7736a3 --- /dev/null +++ b/_blocks/security-tls-caddy.md @@ -0,0 +1,68 @@ +# SECURITY — TLS via Caddy (automatic ACME, HTTP-01 / DNS-01) + +**Why Caddy:** zero-config TLS. Caddy 2 auto-provisions certificates via Let's Encrypt / ZeroSSL on first request for a domain that resolves to it, auto-renews, and stores state under `/var/lib/caddy/`. Official docs: [VERIFIED 2026-04-21]. + +**One-liner install (Debian/Ubuntu, official repo):** +``` +# Pinned to official Cloudsmith repo — NEVER `curl … | bash` a random domain. +sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https curl +curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' \ + | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg +curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' \ + | sudo tee /etc/apt/sources.list.d/caddy-stable.list +sudo apt update && sudo apt install -y caddy +``` +This installs the `caddy` systemd service owned by `caddy:caddy`. **Never run Caddy as root** — it uses `CAP_NET_BIND_SERVICE` ambient capability to bind low ports. + +**Minimal `/etc/caddy/Caddyfile`:** +``` +{ + # Global options + email admin@example.com # ACME account contact (change!) + # auto_https disable_redirects # uncomment only if fronted by another TLS-terminating proxy +} + +api.example.com { + encode zstd gzip + log { + output file /var/log/caddy/api.log { + roll_size 10mb + roll_keep 10 + } + } + reverse_proxy 127.0.0.1:8080 + header { + Strict-Transport-Security "max-age=31536000; includeSubDomains" + X-Content-Type-Options "nosniff" + Referrer-Policy "strict-origin-when-cross-origin" + -Server + } +} +``` +`caddy validate --config /etc/caddy/Caddyfile` BEFORE `systemctl reload caddy`. Reload ≠ restart; reload is zero-downtime. + +**ACME challenge choice:** +- **HTTP-01** (default) — Caddy binds port 80, LE connects back, serves challenge. Requires: port 80 open to the internet, DNS pointing to the VM. Works for single-host public services. +- **DNS-01** — Caddy writes a TXT record via DNS provider API, doesn't need port 80 open. **Required for wildcard certs** (`*.example.com`) and for LAN-only hosts. Needs a DNS-provider plugin (e.g. `caddy-dns/cloudflare`) compiled into the binary — use `xcaddy build` or the Cloudsmith `caddy-dns-*` packages. + +**DNS-01 with Cloudflare (`caddy-dns/cloudflare`):** +``` +*.internal.example.com, internal.example.com { + tls { + dns cloudflare {env.CF_API_TOKEN} + } + reverse_proxy 127.0.0.1:8080 +} +``` +`CF_API_TOKEN` — store in `/etc/caddy/caddy.env` (chmod 0640, `caddy:caddy`), load via systemd drop-in `EnvironmentFile=`. Never bake the token into the Caddyfile (RULE 0.8 — see `domain-has-secrets.md`). + +**CT log awareness:** every LE cert is published to Certificate Transparency logs. **Any subdomain you cert is publicly searchable** via crt.sh. Use DNS-01 + wildcard for internal services whose names should not leak. + +**Firewall interop (see `security-firewall-ufw.md`):** `ufw allow 80,443/tcp` is required for HTTP-01 and for public HTTPS. Do NOT open 80 if using DNS-01 exclusively and not redirecting HTTP→HTTPS publicly; skip the redirect with `auto_https disable_redirects`. + +**Hardening:** +- `HSTS` as shown above — 1 year, include subdomains. Add `preload` only after submitting to the HSTS preload list. +- `-Server` header strip — removes Caddy version disclosure. +- Rate limit via `caddy-ratelimit` module (needs `xcaddy build` with the plugin) for per-IP throttling; otherwise rely on cloud/ufw layer. + +**Forbidden:** running Caddy as root; embedding DNS/ACME API tokens in the Caddyfile; using `tls internal` (self-signed, ephemeral CA) for anything reachable from outside localhost; skipping `caddy validate` before reload; self-hosting ACME (step-ca is great, but needs its own runbook — out of scope here).