KeiSeiKit-1.0/_blocks/deploy-modal.md at dc196dc3258ae9f4ce241019bb05a9228b3eebe1

denis 0b901cf2f9 feat: KeiSeiKit v0.1.0 — initial public release

Generic Constructor-Pattern agent kit for Claude Code. Zero personal data,
fully English, MIT-licensed.

Contents:
- 34 reusable blocks (baseline, rules, stack/deploy/domain/api/scraper)
- 14 cross-project agent manifests (code/ml/infra/researcher/critic/...)
- 6 portable skills (/new-agent, /research, /test-gen, /debug-deep, /pr-review, /refactor)
- Rust assembler (single binary, ~500 KB)
- 3 hooks (auto-reassemble, pre-commit validate, no-hand-edit)
- install.sh (idempotent, cargo-builds on first run)
- MIT LICENSE

All 6 sanity greps pass: 0 Russian text, 0 specific project names,
0 incident numbers, 0 user paths, 0 hardcoded IPs, 0 API keys.

cargo check + assemble --validate: both pass on 14 manifests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-20 23:58:34 +08:00

1.8 KiB

Raw Blame History

A real cost-overrun incident (tens of dollars lost to unchecked runs) and a real KILL-GUARD incident (over an hour of training killed for a non-critical bug) shape every rule below.

Pre-launch 10-step checklist (all ticks before modal run):

modal app list — verify no collisions/duplicates
GPU compat: A10G torch ≥ 2.0 (~~$1.10/hr), H100 torch ≥ 2.1 (~~$4.50/hr), B200 torch ≥ 2.6 (~$8/hr)
cat the script — confirm file edits actually landed
Cost estimate in dollars, verified on live https://modal.com/pricing (NOT from memory)
Volume + vol.commit() after each write
Checkpoints every 500 steps saving state_dict (not just JSON metrics)
retries=modal.Retries(max_retries=1) minimum
.spawn() for batches — NEVER .map() (cascade-kill on single failure)
flush=True on every print; progress every 250 steps
Single-variant smoke run BEFORE fanning out to N variants

Cost tiers: AUTO < $5 · WARN $5-$20 (daily cap $20) · STOP > $20 (explicit user "yes, launch").

KILL GUARD (no exception):

NEVER modal app stop, modal app kill, kill <pid>, pkill -f modal without literal user phrase "yes, stop it".
Before any stop: modal app list → show user what is running, how long in, how much remaining, current checkpoint state.
A bug in the launching script is NOT a reason to kill a running training run.

Volume persistence: results survive only inside modal.Volume with explicit vol.commit(). Stdout is ephemeral — checkpoints in volume, metrics in volume, logs to volume.

Forbidden: guessed prices from memory; .map(return_exceptions=False) for batches; print() without flush=True; launching N variants before one verified single-variant; restarting "for cleanliness" when checkpoints are flowing; stopping a run to fix the launching script.

1.8 KiB Raw Blame History

DEPLOY — Modal (GPU compute)

1.8 KiB

Raw Blame History