Merge branch 'feat/v0.7-testing-matrix' — 4 blocks + /test-matrix
This commit is contained in:
commit
40d11e7dac
11 changed files with 682 additions and 0 deletions
53
_blocks/test-e2e.md
Normal file
53
_blocks/test-e2e.md
Normal file
|
|
@ -0,0 +1,53 @@
|
|||
# TEST — End-to-end (Playwright browser automation)
|
||||
|
||||
E2E tests drive a real browser against a real deployed stack and assert user-visible behaviour. Slow + flaky by nature — so discipline matters more than count. One reliable E2E beats ten flaky ones.
|
||||
|
||||
**Default tool:** `Playwright` (Microsoft, TS/JS/Python/.NET/Java bindings). Preferred over Cypress because: multi-browser (Chromium / Firefox / WebKit), parallel by default, trace viewer (time-travel debugger), auto-waiting for elements, network interception built-in. [E4, playwright.dev]
|
||||
|
||||
Cypress is the runner-up; use only if team already owns it. `Selenium` is legacy — avoid for new E2E.
|
||||
|
||||
**Scope:**
|
||||
- E2E = **critical user journeys only** (login, checkout, primary CRUD flow, signup). Target ~5-15 tests, not 500.
|
||||
- Everything else (form validation, error states, edge cases) → unit + integration + component tests.
|
||||
- Rule: if a regression here would be a production incident, it's an E2E candidate.
|
||||
|
||||
**Page Object pattern (mandatory):**
|
||||
```ts
|
||||
class LoginPage {
|
||||
constructor(private page: Page) {}
|
||||
async goto() { await this.page.goto('/login'); }
|
||||
async login(user: string, pass: string) {
|
||||
await this.page.getByLabel('Email').fill(user);
|
||||
await this.page.getByLabel('Password').fill(pass);
|
||||
await this.page.getByRole('button', { name: 'Sign in' }).click();
|
||||
}
|
||||
}
|
||||
```
|
||||
Selectors live in the page object, never in the test. When the UI changes, ONE file updates.
|
||||
|
||||
**Selector discipline:**
|
||||
- Prefer `getByRole` / `getByLabel` / `getByText` (accessibility-anchored, survive CSS refactors).
|
||||
- Fallback to `data-testid` attributes added purely for tests.
|
||||
- AVOID CSS class selectors, XPath, nth-child — they break on every style change.
|
||||
|
||||
**Test isolation:**
|
||||
- Each test gets a clean auth state via `storageState` fixtures (login once per project, reuse the cookie jar).
|
||||
- Each test uses a fresh data scope — either a disposable test tenant, a UUID prefix, or DB truncation in a `beforeEach`.
|
||||
- NEVER depend on test ordering. Parallel-safe by construction.
|
||||
|
||||
**CI headless + tracing:**
|
||||
- Headless by default, headed only when debugging locally (`--headed --debug`).
|
||||
- Enable trace on retry: `trace: 'on-first-retry'` — zero overhead on green runs, full forensic on flakes.
|
||||
- Upload `test-results/` as CI artifact. Open traces with `npx playwright show-trace trace.zip`.
|
||||
- Video + screenshots on failure: `video: 'retain-on-failure'`, `screenshot: 'only-on-failure'`.
|
||||
|
||||
**Flake policy:**
|
||||
- Retry **at most twice** in CI. If a test retries often, it's a real bug — either in the SUT or the test.
|
||||
- Quarantine flaky tests (`test.skip()` with a tracked ticket), never silently `retry: 5`.
|
||||
- Root-cause flakes with the trace viewer, not by adding `waitForTimeout` (always a smell).
|
||||
|
||||
**Forbidden:**
|
||||
- `page.waitForTimeout(ms)` — use auto-waiting locators or explicit `expect(...).toBeVisible()` polls.
|
||||
- Running E2E against production without a dedicated test account and a rate limit.
|
||||
- E2E-testing behaviour already covered by a unit/integration test (slow duplication).
|
||||
- Hardcoded sleeps, hardcoded URLs, hardcoded user credentials in test files (use fixtures + env vars).
|
||||
32
_blocks/test-fuzz.md
Normal file
32
_blocks/test-fuzz.md
Normal file
|
|
@ -0,0 +1,32 @@
|
|||
# TEST — Fuzzing (input-space exploration)
|
||||
|
||||
Fuzzing throws semi-random inputs at a target to find crashes, panics, hangs, and undefined behaviour the unit-test author never imagined. Complements `test-gen` (happy/edge/error) — fuzz owns the unknown-unknown surface.
|
||||
|
||||
**When to fuzz:** parsers, deserializers, protocol handlers, auth/crypto boundaries, any function that accepts untrusted bytes or strings. NOT business logic with well-defined inputs (use property tests instead).
|
||||
|
||||
**Per-language tool (default):**
|
||||
- **Rust:** `cargo-fuzz` (libfuzzer-sys backend) — `cargo fuzz init`, then `fuzz_target!(|data: &[u8]| { my_parser(data); })`. Requires nightly. Harness lives in `fuzz/fuzz_targets/`. [E4, official: https://rust-fuzz.github.io/book/]
|
||||
- **Python:** `hypothesis` in fuzz mode (`@given` + `HealthCheck.too_slow` disabled) for structured inputs; `atheris` (Google, libfuzzer bindings) for bytes-in fuzzing. [E4, hypothesis.readthedocs.io / github.com/google/atheris]
|
||||
- **JS/TS:** `fast-check` with `fc.assert` using `numRuns: 10_000+` for fuzz-volume runs; `jsfuzz` for libFuzzer-style bytes fuzzing. [E4, fast-check.dev]
|
||||
|
||||
**Corpus management:**
|
||||
- Seed corpus = hand-picked valid inputs (1-10 files). Place under `fuzz/corpus/<target>/`.
|
||||
- Fuzzer mutates corpus → keeps inputs that hit new coverage → corpus grows.
|
||||
- Commit corpus to git (gitignore `fuzz/artifacts/`). Treat as test fixture.
|
||||
|
||||
**Crash triage:**
|
||||
1. Fuzzer dumps crash input under `fuzz/artifacts/<target>/crash-<hash>`.
|
||||
2. Reproduce: `cargo fuzz run <target> fuzz/artifacts/<target>/crash-<hash>`.
|
||||
3. Minimize: `cargo fuzz tmin <target> <input>` — shrinks to minimal reproducer.
|
||||
4. Write a regression unit test using the minimized input BEFORE fixing the bug. Regression test is permanent; fuzz corpus is ephemeral.
|
||||
|
||||
**CI integration (budget-aware):**
|
||||
- Short CI run: 60s per target on every PR. Catches regressions, not deep bugs.
|
||||
- Nightly run: 1-4h per target on schedule. Upload crashes as artifacts.
|
||||
- OSS-Fuzz (free for OSS): submit a `project.yaml` + Dockerfile + build script; Google runs fuzzing on their infra. [E4, google.github.io/oss-fuzz]
|
||||
|
||||
**Forbidden:**
|
||||
- Fuzzing without a crash-reproducer harness (crashes become irreproducible).
|
||||
- Running fuzzer without `cargo fuzz tmin` / equivalent — full-size crashes waste reviewer time.
|
||||
- Committing `fuzz/artifacts/` (binary crash bodies, repo bloat).
|
||||
- Treating a fuzz hit as "flaky" — every crash is a bug until minimized + explained.
|
||||
48
_blocks/test-load.md
Normal file
48
_blocks/test-load.md
Normal file
|
|
@ -0,0 +1,48 @@
|
|||
# TEST — Load / performance testing (baseline → profile → fix)
|
||||
|
||||
Load tests answer: "how much traffic does this system handle before SLO violation?" Not "does it work" (unit/integration) but "does it stay up under N RPS for T minutes with p99 < X ms". The loop is **baseline → profile → fix → re-baseline**, never "run once and ship".
|
||||
|
||||
**Tool choice (default):**
|
||||
- **`k6`** (Grafana, JS scripting) — best for HTTP/REST/WS APIs with scripted scenarios + thresholds; built-in SLO assertions; Docker-friendly. [E4, k6.io]
|
||||
- **`vegeta`** (Go, CLI) — simplest constant-rate HTTP attacker; great for flat-load smoke tests; pipes into plots. [E4, github.com/tsenart/vegeta]
|
||||
- **`oha`** (Rust) — modern `hey` replacement, good for quick local baselines, HTTP/2 + HTTP/3. [E4, github.com/hatoof/oha]
|
||||
- **`hyperfine`** (Rust) — microbenchmark CLI for single commands / binaries; NOT a web load tool. Use for build-time, cold-start, compile-speed measurements. [E4, github.com/sharkdp/hyperfine]
|
||||
|
||||
**SLO definition (write BEFORE running):**
|
||||
1. **Latency:** p50 < A ms, p95 < B ms, p99 < C ms (p99 is the user-felt number).
|
||||
2. **Throughput:** sustain N RPS for T minutes without error budget burn.
|
||||
3. **Error rate:** < 0.1% 5xx, < 1% 4xx (excluding user errors).
|
||||
4. **Resource:** CPU < 70%, memory < 80% of instance, no OOM kills.
|
||||
|
||||
Without SLOs written down, "the test passes" is meaningless.
|
||||
|
||||
**The loop:**
|
||||
1. **Baseline:** lowest realistic load (10 RPS for 1 min). Record latency histogram, CPU, memory. This is the "no-load" floor.
|
||||
2. **Ramp:** step-up load (10 → 50 → 100 → 200 RPS, 2 min each). Find the knee — where p99 doubles or errors appear.
|
||||
3. **Profile at the knee:** attach `perf` / `pprof` / `tokio-console` / `flamegraph`. Identify top hot function.
|
||||
4. **Fix** the hottest contributor (add index, cache, pooling, algorithm swap). ONE change at a time.
|
||||
5. **Re-baseline** at the same step-up. Knee should move right. If not, the fix was wrong → revert, reprofile.
|
||||
|
||||
**k6 threshold example (copy into CI):**
|
||||
```js
|
||||
export const options = {
|
||||
stages: [{ duration: '2m', target: 100 }],
|
||||
thresholds: {
|
||||
http_req_duration: ['p(95)<500', 'p(99)<1000'],
|
||||
http_req_failed: ['rate<0.01'],
|
||||
},
|
||||
};
|
||||
```
|
||||
If thresholds fail, k6 exits non-zero → CI job red.
|
||||
|
||||
**CI integration:**
|
||||
- Short smoke load test on every PR (30s, low RPS, strict thresholds). Catches obvious regressions.
|
||||
- Nightly full load test on a dedicated environment, not shared prod.
|
||||
- Publish HTML report (k6 cloud / Grafana) as a CI artifact.
|
||||
|
||||
**Forbidden:**
|
||||
- Load-testing against production without a killswitch + comms.
|
||||
- Running without SLOs defined in the test file itself (no "looks ok" verdicts).
|
||||
- Running multiple load tests in parallel against the same target (interferes with each other).
|
||||
- Changing two things between runs ("I added an index AND a cache") — can't attribute the delta.
|
||||
- Ignoring CPU/memory — latency alone hides resource leaks that kill you at 24h.
|
||||
36
_blocks/test-property.md
Normal file
36
_blocks/test-property.md
Normal file
|
|
@ -0,0 +1,36 @@
|
|||
# TEST — Property-based testing (invariants + shrinking)
|
||||
|
||||
A property test asserts an invariant — a statement true for every valid input — and the framework generates hundreds of inputs automatically. On failure, it shrinks the input to the minimal reproducer. Complements unit tests (which assert on hand-picked examples) and fuzz (which throws bytes at a boundary).
|
||||
|
||||
**When to use:** pure functions with stable contracts — parsers (`encode ∘ decode = id`), data structures (insert-then-lookup = hit), serializers, math, state machines with invariants. NOT for side-effectful handlers (use integration tests).
|
||||
|
||||
**Per-language tool (default):**
|
||||
- **Rust:** `proptest` — `proptest! { fn roundtrip(s in "\\PC*") { assert_eq!(decode(encode(&s)), s); } }`. Supports stateful tests via `proptest-state-machine`. Prefer over `quickcheck` (proptest has better shrinking + regression file). [E4, proptest.rs]
|
||||
- **Python:** `hypothesis` — `@given(st.integers())` / `@given(st.text())`. Stateful: `hypothesis.stateful.RuleBasedStateMachine`. Regression examples auto-saved under `.hypothesis/`. [E4, hypothesis.readthedocs.io]
|
||||
- **JS/TS:** `fast-check` — `fc.assert(fc.property(fc.string(), s => decode(encode(s)) === s))`. Stateful: `fc.commands`. [E4, fast-check.dev]
|
||||
|
||||
**Writing a good property:**
|
||||
1. **Round-trip:** `f⁻¹(f(x)) == x` (encode/decode, parse/print, serialize/deserialize).
|
||||
2. **Idempotence:** `f(f(x)) == f(x)` (normalize, sort, dedupe).
|
||||
3. **Invariant:** `op(x)` preserves property P (insert preserves size+1; sort preserves multiset).
|
||||
4. **Metamorphic:** `f(g(x)) == h(f(x))` (commute operations).
|
||||
5. **Comparison with oracle:** `my_fast_impl(x) == simple_slow_impl(x)` for all x.
|
||||
|
||||
**Shrinking:**
|
||||
- When a test fails, framework automatically shrinks the counterexample to the smallest input reproducing the failure.
|
||||
- Commit the shrunk example as a regression unit test. Do NOT rely on the `.proptest-regressions` / `.hypothesis/examples` cache alone — commit it, but also pin the hit in a normal test.
|
||||
|
||||
**Stateful tests:**
|
||||
- Model a state machine: commands + preconditions + postconditions + model state.
|
||||
- Framework generates valid command sequences, applies to SUT and model, asserts equality.
|
||||
- Use for data structures, caches, stateful APIs, small DSLs.
|
||||
|
||||
**Config discipline:**
|
||||
- `cases = 1024` default; bump to 10_000 for CI; lower to 64 for quick local iteration.
|
||||
- Seed explicitly for reproducibility in CI logs (`PROPTEST_CASES=10000 PROPTEST_SEED=42`).
|
||||
|
||||
**Forbidden:**
|
||||
- Property assertions that just restate the implementation (`f(x) == f(x)`).
|
||||
- Disabling shrinking ("it took too long") — shrunk output is the whole point.
|
||||
- Ignoring a single failing case as "flaky" — properties don't flake; the input found a bug.
|
||||
- Mixing property tests with external services (DB, network) — properties must be deterministic.
|
||||
|
|
@ -7,6 +7,12 @@ arguments:
|
|||
required: true
|
||||
---
|
||||
|
||||
> **Complements `/test-matrix`.** `/test-gen` owns per-function unit tests
|
||||
> (happy / edge / error). `/test-matrix` owns project-wide testing strategy
|
||||
> (fuzz / property / load / e2e / mutation) and CI wiring. Use `/test-gen`
|
||||
> for a specific function, `/test-matrix` at project kickoff or when
|
||||
> coverage gaps span paradigms. See `skills/test-matrix/SKILL.md`.
|
||||
|
||||
# Test Generation Workflow
|
||||
|
||||
## Step 1: Analyze Target
|
||||
|
|
|
|||
104
skills/test-matrix/SKILL.md
Normal file
104
skills/test-matrix/SKILL.md
Normal file
|
|
@ -0,0 +1,104 @@
|
|||
---
|
||||
name: test-matrix
|
||||
description: Use when a project needs testing BEYOND unit tests — fuzzing, property-based, load, E2E, or mutation. Five-phase hub-and-spoke pipeline composes the right mix per language × critical path × CI target, scaffolds configs + corpus + fixtures, wires CI jobs, and defines the crash/regression triage workflow. Pure-click: every decision except intake is an AskUserQuestion.
|
||||
argument-hint: <free-text description of what needs testing and why>
|
||||
---
|
||||
|
||||
# /test-matrix — Testing beyond unit tests (index)
|
||||
|
||||
You are designing a **testing matrix** for a project that already has (or
|
||||
should have) unit-test coverage via `/test-gen`. This skill owns the
|
||||
orthogonal axes:
|
||||
|
||||
- **Fuzzing** — input-space exploration at boundaries (parsers, deserializers, crypto)
|
||||
- **Property-based** — invariants verified over generated inputs (pure functions, data structures)
|
||||
- **Load** — SLO assertion under traffic (`k6`/`vegeta`/`oha`, baseline→profile→fix)
|
||||
- **E2E** — browser-driven critical journeys (Playwright, page objects, trace viewer)
|
||||
- **Mutation** — test-suite quality verification (mutmut / cargo-mutants / StrykerJS)
|
||||
|
||||
**Not duplicated here:** happy-path / edge / error unit tests (`/test-gen`
|
||||
owns those). This skill links rather than re-implements.
|
||||
|
||||
This `SKILL.md` is the INDEX. Each phase lives in its own file, executed in
|
||||
order. Never skip, never re-order.
|
||||
|
||||
---
|
||||
|
||||
## Pipeline overview (5 phases + final report)
|
||||
|
||||
| Phase | File | Purpose | AskUserQuestion count |
|
||||
|---|---|---|---:|
|
||||
| 1 | [phase-1-intake.md](phase-1-intake.md) | Language(s), coverage baseline, critical paths, CI target | 1× (multi-part) |
|
||||
| 2 | [phase-2-matrix.md](phase-2-matrix.md) | Select test types × languages matrix | 1× multi-select |
|
||||
| 3 | [phase-3-scaffold.md](phase-3-scaffold.md) | Generate config + corpus + fixtures per selected cell | 1× per cell |
|
||||
| 4 | [phase-4-ci-wire.md](phase-4-ci-wire.md) | CI job per test type; artifacts; failure policy | 1× multi-select |
|
||||
| 5 | [phase-5-triage.md](phase-5-triage.md) | Crash + regression triage workflow | 1× |
|
||||
|
||||
Minimum AskUserQuestion count across a full session: **5** (one per phase).
|
||||
Higher when Phase 3 expands per selected cell. This is the pure-click
|
||||
contract.
|
||||
|
||||
---
|
||||
|
||||
## Variables the pipeline produces
|
||||
|
||||
| Name | Set in | Meaning |
|
||||
|---|---|---|
|
||||
| `LANGS` | Phase 1 | Languages in scope (Rust / Python / JS-TS / Go / Swift / Flutter — multi) |
|
||||
| `COVERAGE` | Phase 1 | Baseline unit-test coverage % (or "unknown") |
|
||||
| `CRITICAL` | Phase 1 | Critical paths: auth / payment / data-integrity / perf / untrusted-input |
|
||||
| `CI` | Phase 1 | github-actions / forgejo-actions / self-hosted / none |
|
||||
| `MATRIX` | Phase 2 | Set of (test-type × language) cells to scaffold |
|
||||
| `SCAFFOLDED` | Phase 3 | Files written per cell (paths + corpus seeds) |
|
||||
| `CI_JOBS` | Phase 4 | CI workflow entries added per cell |
|
||||
| `TRIAGE_DOC` | Phase 5 | Path to `docs/testing/triage.md` (or project-local equivalent) |
|
||||
|
||||
---
|
||||
|
||||
## Final report (emit after Phase 5)
|
||||
|
||||
```
|
||||
=== TEST-MATRIX REPORT ===
|
||||
Languages: <LANGS>
|
||||
Coverage (unit): <COVERAGE>
|
||||
Critical paths: <CRITICAL>
|
||||
Matrix cells: <count> — <list (type × lang)>
|
||||
Files written: <count> (configs + corpus + fixtures)
|
||||
CI jobs added: <count> (<per-type failure policy>)
|
||||
Triage doc: <TRIAGE_DOC>
|
||||
Next action: Run <cmd> locally to verify the scaffold, then commit.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rules (apply throughout)
|
||||
|
||||
- **Pure-click contract.** Only the Phase 1 intake paragraph is free text.
|
||||
Everything else is `AskUserQuestion`. Count in the final report.
|
||||
- **NO DOWNGRADE (RULE -1).** If a language × type cell has no good tool,
|
||||
return 2-3 constructive paths, never "not supported".
|
||||
- **NO HALLUCINATION (RULE 0.4).** Every tool / library cited must exist
|
||||
and be current. When in doubt, mark `[UNVERIFIED — verify release page]`
|
||||
and surface in the report.
|
||||
- **Plan Mode First (RULE 0.5).** This skill IS the plan; no writes before
|
||||
the corresponding phase's confirm click.
|
||||
- **Constructor Pattern (RULE ZERO).** Block files (`_blocks/test-*.md`)
|
||||
stay ≤ 60 LOC. This SKILL.md ≤ 200 LOC; phase files ≤ 150 LOC each.
|
||||
- **Surgical Changes.** Writes only to:
|
||||
- `<repo>/tests/`, `<repo>/fuzz/`, `<repo>/e2e/`, `<repo>/load/`
|
||||
- `<repo>/.github/workflows/` or `<repo>/.forgejo/workflows/`
|
||||
- `<repo>/docs/testing/triage.md`
|
||||
- No writes to `_blocks/` here (that's `compose-solution`'s Phase 6).
|
||||
- **No duplication with `/test-gen`.** If the user really wants unit-test
|
||||
generation, Phase 1 detects it and hands off immediately.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [phase-1-intake.md](phase-1-intake.md) · [phase-2-matrix.md](phase-2-matrix.md) · [phase-3-scaffold.md](phase-3-scaffold.md) · [phase-4-ci-wire.md](phase-4-ci-wire.md) · [phase-5-triage.md](phase-5-triage.md)
|
||||
- `skills/test-gen/SKILL.md` — unit-test generation (happy / edge / error).
|
||||
Phase 1 hands off there if intake reveals unit-test gap, not matrix gap.
|
||||
- `_blocks/test-fuzz.md` · `_blocks/test-property.md` · `_blocks/test-load.md` · `_blocks/test-e2e.md` — per-paradigm reference blocks, composable into manifests.
|
||||
- `_blocks/rule-test-first.md` — TDD / tests-with-code discipline (inherited).
|
||||
- `skills/compose-solution/SKILL.md` — if you need a NEW block (e.g. mutation-specific), hand off there (Phase 6 block-augment).
|
||||
92
skills/test-matrix/phase-1-intake.md
Normal file
92
skills/test-matrix/phase-1-intake.md
Normal file
|
|
@ -0,0 +1,92 @@
|
|||
# Phase 1 — Intake (language, coverage, critical paths, CI)
|
||||
|
||||
One free-text paragraph + one AskUserQuestion multi-part batch.
|
||||
|
||||
## 1a — Ask for the testing-gap description
|
||||
|
||||
Emit a regular message (NOT AskUserQuestion):
|
||||
|
||||
> Describe in one paragraph: what are you testing (project name / stack),
|
||||
> what gap is `/test-gen` not solving (fuzz? load? E2E? mutation? all?),
|
||||
> and what failure mode would be worst (prod crash? data loss? latency
|
||||
> regression? auth bypass?). Reply in one message.
|
||||
|
||||
Store verbatim as `INTAKE`.
|
||||
|
||||
If `INTAKE` mentions ONLY "unit tests" / "missing tests for function X"
|
||||
(unit-level gap, not matrix gap), emit:
|
||||
|
||||
```
|
||||
DETECTION: this is a /test-gen task, not /test-matrix.
|
||||
Handing off to `skills/test-gen/SKILL.md`. Re-run /test-matrix later
|
||||
when fuzz / property / load / E2E / mutation coverage is needed.
|
||||
```
|
||||
|
||||
…and STOP. Do not proceed.
|
||||
|
||||
## 1b — Multi-part intake click (one AskUserQuestion call)
|
||||
|
||||
```json
|
||||
{
|
||||
"questions": [
|
||||
{
|
||||
"question": "Language(s) in scope?",
|
||||
"header": "Languages",
|
||||
"multiSelect": true,
|
||||
"options": [
|
||||
{"label": "Rust", "description": "cargo-fuzz, proptest, cargo-mutants, oha"},
|
||||
{"label": "Python", "description": "hypothesis, atheris, mutmut, schemathesis"},
|
||||
{"label": "JavaScript/TypeScript", "description": "fast-check, StrykerJS, Playwright"},
|
||||
{"label": "Go", "description": "built-in fuzz (go test -fuzz), gopter, vegeta"},
|
||||
{"label": "Swift", "description": "SwiftCheck, XCUITest — limited fuzz tooling"},
|
||||
{"label": "Flutter/Dart", "description": "glados property, flutter integration_test"}
|
||||
]
|
||||
},
|
||||
{
|
||||
"question": "Baseline unit-test coverage?",
|
||||
"header": "Coverage",
|
||||
"multiSelect": false,
|
||||
"options": [
|
||||
{"label": "High (≥ 80%)", "description": "Matrix tests layer on top of solid unit base"},
|
||||
{"label": "Medium (40-80%)", "description": "Run /test-gen in parallel, don't skip unit gaps"},
|
||||
{"label": "Low (< 40%)", "description": "Strongly recommend /test-gen FIRST — fuzz+load on buggy code wastes CI"},
|
||||
{"label": "Unknown — need to measure", "description": "Phase 3 will add a coverage job before scaffolding"}
|
||||
]
|
||||
},
|
||||
{
|
||||
"question": "Critical paths (multi-select)?",
|
||||
"header": "Critical",
|
||||
"multiSelect": true,
|
||||
"options": [
|
||||
{"label": "Auth / session / crypto", "description": "Fuzz + property mandatory on token parsers + signature verify"},
|
||||
{"label": "Payment / money-in-motion", "description": "E2E + property (invariants: no negative balance, idempotency) mandatory"},
|
||||
{"label": "Data integrity (DB / serialization)", "description": "Property-based round-trips + migration E2E"},
|
||||
{"label": "Performance-sensitive (< 100ms SLO)", "description": "Load tests with k6/oha mandatory; set SLO thresholds in CI"},
|
||||
{"label": "Untrusted-input parsing", "description": "Fuzz mandatory (cargo-fuzz / atheris / jsfuzz)"},
|
||||
{"label": "User-facing UI flows", "description": "E2E with Playwright on 5-15 critical journeys"}
|
||||
]
|
||||
},
|
||||
{
|
||||
"question": "CI target?",
|
||||
"header": "CI",
|
||||
"multiSelect": false,
|
||||
"options": [
|
||||
{"label": "GitHub Actions", "description": "workflow file under .github/workflows/"},
|
||||
{"label": "Forgejo Actions", "description": "workflow file under .forgejo/workflows/ (kit default — RULE 0.1 compatible)"},
|
||||
{"label": "Self-hosted / custom", "description": "Emit portable YAML + shell scripts; wire manually"},
|
||||
{"label": "None — local only", "description": "Generate Makefile / justfile targets, no CI"}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Store as `LANGS`, `COVERAGE`, `CRITICAL`, `CI`.
|
||||
|
||||
## Verify-criterion
|
||||
|
||||
- `INTAKE` is non-empty.
|
||||
- `LANGS` has ≥ 1 entry.
|
||||
- `CRITICAL` has ≥ 1 entry (zero-critical-path tasks are unit-test-only — redirect to /test-gen).
|
||||
- `CI` is exactly one value.
|
||||
- On failure, re-ask the failing input only. Never fall through.
|
||||
80
skills/test-matrix/phase-2-matrix.md
Normal file
80
skills/test-matrix/phase-2-matrix.md
Normal file
|
|
@ -0,0 +1,80 @@
|
|||
# Phase 2 — Select the test-type × language matrix
|
||||
|
||||
Goal: turn `CRITICAL` + `LANGS` into the minimum set of `(test-type, language)`
|
||||
cells to scaffold. Fewer cells, done well, beats many cells half-wired.
|
||||
|
||||
## 2a — Preview auto-recommendation
|
||||
|
||||
Apply these rules and emit a preview table in chat (markdown):
|
||||
|
||||
| Critical path | Recommended test types |
|
||||
|---|---|
|
||||
| Auth / crypto | fuzz + property |
|
||||
| Payment | property + e2e + mutation |
|
||||
| Data integrity | property + e2e |
|
||||
| Performance SLO | load |
|
||||
| Untrusted parsing | fuzz + property |
|
||||
| User-facing UI | e2e |
|
||||
|
||||
Cross-product with `LANGS` → tentative `MATRIX_RECO`. Example output in chat:
|
||||
|
||||
```
|
||||
Recommended cells (from CRITICAL × LANGS):
|
||||
[1] fuzz × Rust — rationale: untrusted-parsing + Rust → cargo-fuzz
|
||||
[2] property × Rust — rationale: data-integrity + Rust → proptest
|
||||
[3] e2e × TS — rationale: user-facing UI → Playwright
|
||||
[4] load × Rust — rationale: <100ms SLO → oha + k6
|
||||
[5] mutation × Rust — rationale: payment → cargo-mutants for suite quality
|
||||
```
|
||||
|
||||
Number each cell for the multi-select.
|
||||
|
||||
## 2b — Confirm / edit matrix (AskUserQuestion multi-select)
|
||||
|
||||
```json
|
||||
{
|
||||
"questions": [
|
||||
{
|
||||
"question": "Which cells to scaffold this session?",
|
||||
"header": "Matrix",
|
||||
"multiSelect": true,
|
||||
"options": [
|
||||
{"label": "[1] fuzz × <lang>", "description": "Generate fuzz target + seed corpus + CI nightly job"},
|
||||
{"label": "[2] property × <lang>", "description": "Add property-test dependency + sample invariant test + regression cache"},
|
||||
{"label": "[3] e2e × <lang>", "description": "Scaffold Playwright project + 1 page-object example + trace viewer"},
|
||||
{"label": "[4] load × <lang>", "description": "k6/oha script + SLO thresholds + profile-loop runbook"},
|
||||
{"label": "[5] mutation × <lang>", "description": "mutmut/cargo-mutants/StrykerJS config + baseline mutation score"},
|
||||
{"label": "Add a custom cell", "description": "Free-text — e.g. contract tests, chaos tests, visual regression"},
|
||||
{"label": "Skip a reco", "description": "Drop one of the recommended cells — free-text reason"}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Options are GENERATED dynamically — one per `MATRIX_RECO` cell PLUS the two
|
||||
catch-alls (`Add custom`, `Skip`). Substitute `<lang>` literally.
|
||||
|
||||
On `Add a custom cell` → single free-text line → regenerate preview →
|
||||
re-ask. On `Skip a reco` → free-text reason (logged in final report) →
|
||||
regenerate → re-ask.
|
||||
|
||||
## 2c — Budget check (soft cap)
|
||||
|
||||
If the final `MATRIX` has > 6 cells, emit a WARNING message (NOT
|
||||
AskUserQuestion):
|
||||
|
||||
> WARNING: <N> cells selected. Scaffolding + CI wiring for each is ~30 min
|
||||
> of human review per cell. Consider splitting into two sessions (critical
|
||||
> cells now, rest next week). Continue? Reply "yes" or re-run Phase 2.
|
||||
|
||||
Store the final `MATRIX` as a list of `{type, lang, rationale}` objects.
|
||||
|
||||
## Verify-criterion
|
||||
|
||||
- `MATRIX` has ≥ 1 cell. Zero cells means nothing to do → stop with a
|
||||
message pointing at `/test-gen`.
|
||||
- Every cell's `type` ∈ {fuzz, property, e2e, load, mutation, custom}.
|
||||
- Every cell's `lang` ∈ `LANGS` (no phantom language).
|
||||
- User explicitly confirmed the final matrix (not just auto-reco) — the
|
||||
multi-select click counts as the confirmation.
|
||||
74
skills/test-matrix/phase-3-scaffold.md
Normal file
74
skills/test-matrix/phase-3-scaffold.md
Normal file
|
|
@ -0,0 +1,74 @@
|
|||
# Phase 3 — Scaffold config + corpus + fixtures per cell
|
||||
|
||||
For each cell in `MATRIX`, generate the minimum-viable scaffold: one
|
||||
dependency declaration, one example test, one fixture / seed corpus, one
|
||||
local-run command. No over-scaffolding — just the "it runs" skeleton.
|
||||
|
||||
## 3a — Per-cell confirmation (AskUserQuestion, loop over cells)
|
||||
|
||||
For each cell, emit ONE AskUserQuestion:
|
||||
|
||||
```json
|
||||
{
|
||||
"questions": [
|
||||
{
|
||||
"question": "Scaffold plan for [<type> × <lang>] — proceed?",
|
||||
"header": "<type>/<lang>",
|
||||
"multiSelect": false,
|
||||
"options": [
|
||||
{"label": "Proceed with default scaffold", "description": "Apply the default files listed below"},
|
||||
{"label": "Minimal only (dep + 1 test)", "description": "Skip CI + corpus; just prove the toolchain runs"},
|
||||
{"label": "Edit one file", "description": "Reply with one free-text path — that file only gets custom content"},
|
||||
{"label": "Skip this cell", "description": "Drop from MATRIX; next cell"}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Preview the default scaffold BEFORE asking, so the user sees what "proceed"
|
||||
means. Example for `[fuzz × Rust]`:
|
||||
|
||||
```
|
||||
Default scaffold for [fuzz × Rust]:
|
||||
+ fuzz/Cargo.toml — cargo-fuzz manifest
|
||||
+ fuzz/fuzz_targets/parse.rs — example fuzz_target!(|data: &[u8]| { ... })
|
||||
+ fuzz/corpus/parse/seed_01 — one hand-picked valid input
|
||||
+ fuzz/README.md — local-run commands
|
||||
Cite: _blocks/test-fuzz.md (corpus mgmt + triage + CI rules)
|
||||
```
|
||||
|
||||
## 3b — Per-type default scaffolds
|
||||
|
||||
| Cell | Files |
|
||||
|---|---|
|
||||
| **fuzz × Rust** | `fuzz/Cargo.toml` (cargo-fuzz), `fuzz/fuzz_targets/<target>.rs`, `fuzz/corpus/<target>/seed_01` |
|
||||
| **fuzz × Python** | `tests/fuzz/test_fuzz_<target>.py` (atheris OR hypothesis in fuzz mode), `tests/fuzz/corpus/` |
|
||||
| **fuzz × JS/TS** | `test/fuzz/<target>.fuzz.ts` (fast-check with `numRuns: 10_000`) |
|
||||
| **property × Rust** | `Cargo.toml` adds `proptest = "*"`, `tests/property_<name>.rs`, `.proptest-regressions` gitkeep |
|
||||
| **property × Python** | `tests/property/test_<name>.py` with `@given`, `.hypothesis/` gitignored except `examples/` |
|
||||
| **property × JS/TS** | `test/property/<name>.spec.ts` with `fc.assert(fc.property(...))` |
|
||||
| **load × any** | `load/k6/baseline.js` with SLO thresholds; `load/README.md` with baseline→profile→fix loop |
|
||||
| **e2e × any** | `e2e/playwright.config.ts`, `e2e/pages/login.page.ts`, `e2e/tests/login.spec.ts`, `e2e/README.md` |
|
||||
| **mutation × Rust** | `.cargo-mutants.toml`, first run command in `tests/mutation/README.md` |
|
||||
| **mutation × Python** | `mutmut` config in `setup.cfg` / `pyproject.toml`, runbook in `tests/mutation/README.md` |
|
||||
| **mutation × JS/TS** | `stryker.conf.mjs` with sane `timeoutMS`, `mutate` glob narrowed to critical paths |
|
||||
|
||||
## 3c — Cite the block
|
||||
|
||||
Every scaffold file's header comment references the relevant `_blocks/`
|
||||
file so the human reviewer can find the discipline rules:
|
||||
|
||||
```rust
|
||||
// See _blocks/test-fuzz.md for corpus management + crash-triage rules.
|
||||
// This file is the minimum skeleton; real targets expand from here.
|
||||
```
|
||||
|
||||
## Verify-criterion
|
||||
|
||||
- For every `MATRIX` cell, user clicked `Proceed` / `Minimal` / explicit `Edit` / `Skip`.
|
||||
- At least one file is written per non-skipped cell.
|
||||
- `SCAFFOLDED` is a list of `{cell, files: [paths]}` entries.
|
||||
- No file overwrites an existing one without explicit confirmation
|
||||
(a PreWrite check: if path exists, emit a second AskUserQuestion
|
||||
"overwrite / skip / rename" before writing).
|
||||
87
skills/test-matrix/phase-4-ci-wire.md
Normal file
87
skills/test-matrix/phase-4-ci-wire.md
Normal file
|
|
@ -0,0 +1,87 @@
|
|||
# Phase 4 — CI wiring per cell (artifacts + failure policy)
|
||||
|
||||
Each scaffolded cell gets exactly one CI job. Different paradigms have
|
||||
different failure-budget rules — wire them explicitly, never "all tests
|
||||
block merge by default".
|
||||
|
||||
## 4a — Per-type failure policy (preview)
|
||||
|
||||
Emit a table in chat showing the default policy per `MATRIX` cell:
|
||||
|
||||
| Cell | Trigger | Duration | Failure policy |
|
||||
|---|---|---|---|
|
||||
| fuzz (short) | PR | 60 s per target | **block merge** on any crash |
|
||||
| fuzz (nightly) | cron | 1-4 h per target | **artifact + issue**, do not block PRs |
|
||||
| property | PR | ~30 s | **block merge** (failures are real bugs) |
|
||||
| load (smoke) | PR | 30-60 s | **block merge** if SLO thresholds fail |
|
||||
| load (full) | nightly / manual | 10-30 min | **artifact + dashboard**, do not block PRs |
|
||||
| e2e (critical) | PR | 2-5 min | **block merge** (retry×2 max) |
|
||||
| e2e (full) | nightly | 15-30 min | **artifact + trace**, do not block PRs |
|
||||
| mutation | weekly / manual | hours | **dashboard + report**, NEVER block PRs |
|
||||
|
||||
Rationale written inline: fuzz and load have two lanes (fast smoke on PR,
|
||||
deep nightly). Mutation testing is too slow to block PRs. E2E uses retries
|
||||
but keeps the retry count honest (max 2).
|
||||
|
||||
## 4b — Confirm CI jobs (AskUserQuestion multi-select)
|
||||
|
||||
```json
|
||||
{
|
||||
"questions": [
|
||||
{
|
||||
"question": "Which CI jobs to generate this session?",
|
||||
"header": "CI Jobs",
|
||||
"multiSelect": true,
|
||||
"options": [
|
||||
{"label": "fuzz-smoke (PR)", "description": "60s per target per PR; blocks merge on crash"},
|
||||
{"label": "fuzz-nightly (cron)", "description": "1-4h deep fuzz; artifacts uploaded; non-blocking"},
|
||||
{"label": "property (PR)", "description": "~30s; blocks merge; PROPTEST_CASES=10000 in CI"},
|
||||
{"label": "load-smoke (PR)", "description": "30-60s; blocks merge if k6 SLO thresholds fail"},
|
||||
{"label": "load-full (nightly)", "description": "10-30m; uploads HTML report; non-blocking"},
|
||||
{"label": "e2e-critical (PR)", "description": "5-15 critical journeys; blocks merge; retry×2 max"},
|
||||
{"label": "e2e-full (nightly)", "description": "full suite; non-blocking; traces on failure"},
|
||||
{"label": "mutation (weekly)", "description": "full mutation run; emits HTML + score; never blocks PRs"},
|
||||
{"label": "coverage gate", "description": "add a coverage-diff gate so /test-gen output is measurable"}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Options are GENERATED — only show the cell types actually present in
|
||||
`MATRIX`. Adding `mutation` to options only if at least one `mutation × _`
|
||||
cell was selected in Phase 2.
|
||||
|
||||
## 4c — Write the workflow file(s)
|
||||
|
||||
Based on `CI` from Phase 1:
|
||||
|
||||
- **GitHub Actions** → `.github/workflows/test-matrix.yml` with jobs as
|
||||
selected. One matrix-strategy job per paradigm (language matrix inside).
|
||||
- **Forgejo Actions** → `.forgejo/workflows/test-matrix.yml` (same schema
|
||||
as GH Actions, compatible syntax). KeiSeiKit default (RULE 0.1).
|
||||
- **Self-hosted / custom** → emit portable YAML + a `Makefile` / `justfile`
|
||||
with the same job commands so humans can wire into any CI.
|
||||
- **None — local only** → write only `Makefile` / `justfile` targets
|
||||
(`make fuzz-smoke`, `make load-smoke`, etc.) and a `docs/testing/ci.md`
|
||||
note explaining how to wire them into CI later.
|
||||
|
||||
## 4d — Artifact discipline
|
||||
|
||||
Every job uploads one artifact directory, never loose files:
|
||||
|
||||
- `fuzz` → `fuzz/artifacts/` (crash inputs + minimized reproducers)
|
||||
- `load` → `load/reports/` (HTML, JSON summaries, Grafana links)
|
||||
- `e2e` → `test-results/` (traces, videos, screenshots — Playwright default)
|
||||
- `mutation` → `mutation-report/` (HTML + JSON)
|
||||
|
||||
Retention: 30 days default; 90 days for nightly + weekly jobs. Never
|
||||
infinite — CI storage costs compound.
|
||||
|
||||
## Verify-criterion
|
||||
|
||||
- `CI_JOBS` has ≥ 1 entry (else redirect to local-only Makefile path).
|
||||
- Workflow file writes to the correct path per `CI` from Phase 1.
|
||||
- Every job declares explicit `timeout-minutes` (no unbounded runs).
|
||||
- Every job uploads artifacts on failure (not just on success).
|
||||
- No job `continue-on-error: true` for PR-blocking lanes.
|
||||
70
skills/test-matrix/phase-5-triage.md
Normal file
70
skills/test-matrix/phase-5-triage.md
Normal file
|
|
@ -0,0 +1,70 @@
|
|||
# Phase 5 — Crash / regression triage workflow
|
||||
|
||||
Every matrix paradigm produces artifacts when it fails: fuzz crashes,
|
||||
shrunk property counterexamples, load-SLO violations, E2E traces,
|
||||
mutation survivors. Without a triage runbook, those artifacts rot.
|
||||
This phase writes `docs/testing/triage.md` so the next failure is
|
||||
actionable in ≤ 15 min.
|
||||
|
||||
## 5a — Confirm runbook generation (AskUserQuestion)
|
||||
|
||||
```json
|
||||
{
|
||||
"questions": [
|
||||
{
|
||||
"question": "Write the triage runbook to docs/testing/triage.md?",
|
||||
"header": "Triage",
|
||||
"multiSelect": false,
|
||||
"options": [
|
||||
{"label": "Yes — full runbook", "description": "Per-paradigm crash / regression flow + artifact paths + commit template"},
|
||||
{"label": "Yes — minimal", "description": "One-page checklist only; skip per-paradigm deep-dives"},
|
||||
{"label": "Skip — team already has one", "description": "Finish without writing; final report notes the external link"}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## 5b — Runbook template (full)
|
||||
|
||||
For every selected paradigm in `MATRIX`, emit a section:
|
||||
|
||||
```
|
||||
## <paradigm> failure triage
|
||||
|
||||
1. Artifact: <fuzz/artifacts/ | .proptest-regressions | load/reports/ | test-results/ | mutation-report/>
|
||||
2. Reproduce locally: <exact command from phase-3 scaffold>
|
||||
3. Minimize: <tmin / shrink / trace-viewer / bisect>
|
||||
4. Write a failing regression test using the minimized input.
|
||||
5. Fix root cause (never the symptom — see RULE: No Patching).
|
||||
6. Re-run the matrix cell. Green = commit with `fix:` + reference artifact SHA.
|
||||
7. If flaky (not deterministic): quarantine with a ticket, never `retry: 5`.
|
||||
```
|
||||
|
||||
Per-paradigm specifics are pulled from the citing `_blocks/test-*.md`:
|
||||
- fuzz → `cargo fuzz tmin` / atheris replay flow (block §crash-triage)
|
||||
- property → commit the shrunk counterexample as a normal unit test
|
||||
- load → re-baseline after each fix; one variable at a time
|
||||
- e2e → open `playwright show-trace`; never add `waitForTimeout`
|
||||
|
||||
## 5c — Commit template
|
||||
|
||||
The runbook ends with a ready-to-copy commit template:
|
||||
|
||||
```
|
||||
fix(<paradigm>): <one-line symptom>
|
||||
|
||||
Reproducer: <minimized artifact path + SHA>
|
||||
Root cause: <1-2 sentences>
|
||||
Regression test: <path to new permanent test>
|
||||
|
||||
See docs/testing/triage.md §<paradigm> for the workflow used.
|
||||
```
|
||||
|
||||
## Verify-criterion
|
||||
|
||||
- `TRIAGE_DOC` is set to `docs/testing/triage.md` (or skipped with reason).
|
||||
- Every `MATRIX` paradigm has a section in the runbook.
|
||||
- Every section lists artifact path + reproduce command + regression-test
|
||||
requirement + root-cause discipline + flake policy.
|
||||
- Commit template present at end of doc.
|
||||
Loading…
Reference in a new issue