Single-commit clean baseline after security scrub of niche-tells, project codenames, internal jargon, and contributor-email leaks. Contents: - 100 Rust crates (_primitives/_rust/) - 37 agent manifests (_manifests/) + generated specs (_generated/) - 67 user-invocable skills (skills/) - 33 hooks (hooks/) - Composition blocks (_blocks/) - Documentation (docs/, README.md) - TS adapter packages (_ts_packages/) - Assembler (_assembler/) - Roles (_roles/) - Templates (_templates/) - Forgejo CI (.forgejo/) Author: Denis Parfionovich <info@greendragon.info> License: see LICENSE.
1.8 KiB
DEPLOY — Modal (GPU compute)
A real cost-overrun incident (tens of dollars lost to unchecked runs) and a real KILL-GUARD incident (over an hour of training killed for a non-critical bug) shape every rule below.
Pre-launch 10-step checklist (all ticks before modal run):
modal app list— verify no collisions/duplicates- GPU compat: A10G torch ≥ 2.0 (
$1.10/hr), H100 torch ≥ 2.1 ($4.50/hr), B200 torch ≥ 2.6 (~$8/hr) catthe script — confirm file edits actually landed- Cost estimate in dollars, verified on live https://modal.com/pricing (NOT from memory)
- Volume +
vol.commit()after each write - Checkpoints every 500 steps saving
state_dict(not just JSON metrics) retries=modal.Retries(max_retries=1)minimum.spawn()for batches — NEVER.map()(cascade-kill on single failure)flush=Trueon every print; progress every 250 steps- Single-variant smoke run BEFORE fanning out to N variants
Cost tiers: AUTO < $5 · WARN $5-$20 (daily cap $20) · STOP > $20 (explicit user "yes, launch").
anti-stop guard (no exception):
- NEVER
modal app stop,modal app kill,kill <pid>,pkill -f modalwithout literal user phrase "yes, stop it". - Before any stop:
modal app list→ show user what is running, how long in, how much remaining, current checkpoint state. - A bug in the launching script is NOT a reason to kill a running training run.
Volume persistence: results survive only inside modal.Volume with explicit vol.commit(). Stdout is ephemeral — checkpoints in volume, metrics in volume, logs to volume.
Forbidden: guessed prices from memory; .map(return_exceptions=False) for batches; print() without flush=True; launching N variants before one verified single-variant; restarting "for cleanliness" when checkpoints are flowing; stopping a run to fix the launching script.