feat(skills): /test-matrix 5-phase pipeline

Adds hub-and-spoke testing-matrix skill complementing /test-gen:
SKILL.md index + phase-1-intake (language/coverage/critical/CI),
phase-2-matrix (test-type × language multi-select), phase-3-scaffold
(config + corpus + fixtures per cell), phase-4-ci-wire (per-type
failure policy + artifacts), phase-5-triage (crash/regression runbook).
Cross-refs _blocks/test-fuzz.md, test-property.md, test-load.md,
test-e2e.md. Adds "complements" note to skills/test-gen/SKILL.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Parfii-bot 2026-04-21 20:46:02 +08:00
parent 8b6ee37134
commit 56ddccfddb
7 changed files with 513 additions and 0 deletions

View file

@ -7,6 +7,12 @@ arguments:
required: true
---
> **Complements `/test-matrix`.** `/test-gen` owns per-function unit tests
> (happy / edge / error). `/test-matrix` owns project-wide testing strategy
> (fuzz / property / load / e2e / mutation) and CI wiring. Use `/test-gen`
> for a specific function, `/test-matrix` at project kickoff or when
> coverage gaps span paradigms. See `skills/test-matrix/SKILL.md`.
# Test Generation Workflow
## Step 1: Analyze Target

104
skills/test-matrix/SKILL.md Normal file
View file

@ -0,0 +1,104 @@
---
name: test-matrix
description: Use when a project needs testing BEYOND unit tests — fuzzing, property-based, load, E2E, or mutation. Five-phase hub-and-spoke pipeline composes the right mix per language × critical path × CI target, scaffolds configs + corpus + fixtures, wires CI jobs, and defines the crash/regression triage workflow. Pure-click: every decision except intake is an AskUserQuestion.
argument-hint: <free-text description of what needs testing and why>
---
# /test-matrix — Testing beyond unit tests (index)
You are designing a **testing matrix** for a project that already has (or
should have) unit-test coverage via `/test-gen`. This skill owns the
orthogonal axes:
- **Fuzzing** — input-space exploration at boundaries (parsers, deserializers, crypto)
- **Property-based** — invariants verified over generated inputs (pure functions, data structures)
- **Load** — SLO assertion under traffic (`k6`/`vegeta`/`oha`, baseline→profile→fix)
- **E2E** — browser-driven critical journeys (Playwright, page objects, trace viewer)
- **Mutation** — test-suite quality verification (mutmut / cargo-mutants / StrykerJS)
**Not duplicated here:** happy-path / edge / error unit tests (`/test-gen`
owns those). This skill links rather than re-implements.
This `SKILL.md` is the INDEX. Each phase lives in its own file, executed in
order. Never skip, never re-order.
---
## Pipeline overview (5 phases + final report)
| Phase | File | Purpose | AskUserQuestion count |
|---|---|---|---:|
| 1 | [phase-1-intake.md](phase-1-intake.md) | Language(s), coverage baseline, critical paths, CI target | 1× (multi-part) |
| 2 | [phase-2-matrix.md](phase-2-matrix.md) | Select test types × languages matrix | 1× multi-select |
| 3 | [phase-3-scaffold.md](phase-3-scaffold.md) | Generate config + corpus + fixtures per selected cell | 1× per cell |
| 4 | [phase-4-ci-wire.md](phase-4-ci-wire.md) | CI job per test type; artifacts; failure policy | 1× multi-select |
| 5 | [phase-5-triage.md](phase-5-triage.md) | Crash + regression triage workflow | 1× |
Minimum AskUserQuestion count across a full session: **5** (one per phase).
Higher when Phase 3 expands per selected cell. This is the pure-click
contract.
---
## Variables the pipeline produces
| Name | Set in | Meaning |
|---|---|---|
| `LANGS` | Phase 1 | Languages in scope (Rust / Python / JS-TS / Go / Swift / Flutter — multi) |
| `COVERAGE` | Phase 1 | Baseline unit-test coverage % (or "unknown") |
| `CRITICAL` | Phase 1 | Critical paths: auth / payment / data-integrity / perf / untrusted-input |
| `CI` | Phase 1 | github-actions / forgejo-actions / self-hosted / none |
| `MATRIX` | Phase 2 | Set of (test-type × language) cells to scaffold |
| `SCAFFOLDED` | Phase 3 | Files written per cell (paths + corpus seeds) |
| `CI_JOBS` | Phase 4 | CI workflow entries added per cell |
| `TRIAGE_DOC` | Phase 5 | Path to `docs/testing/triage.md` (or project-local equivalent) |
---
## Final report (emit after Phase 5)
```
=== TEST-MATRIX REPORT ===
Languages: <LANGS>
Coverage (unit): <COVERAGE>
Critical paths: <CRITICAL>
Matrix cells: <count><list (type × lang)>
Files written: <count> (configs + corpus + fixtures)
CI jobs added: <count> (<per-type failure policy>)
Triage doc: <TRIAGE_DOC>
Next action: Run <cmd> locally to verify the scaffold, then commit.
```
---
## Rules (apply throughout)
- **Pure-click contract.** Only the Phase 1 intake paragraph is free text.
Everything else is `AskUserQuestion`. Count in the final report.
- **NO DOWNGRADE (RULE -1).** If a language × type cell has no good tool,
return 2-3 constructive paths, never "not supported".
- **NO HALLUCINATION (RULE 0.4).** Every tool / library cited must exist
and be current. When in doubt, mark `[UNVERIFIED — verify release page]`
and surface in the report.
- **Plan Mode First (RULE 0.5).** This skill IS the plan; no writes before
the corresponding phase's confirm click.
- **Constructor Pattern (RULE ZERO).** Block files (`_blocks/test-*.md`)
stay ≤ 60 LOC. This SKILL.md ≤ 200 LOC; phase files ≤ 150 LOC each.
- **Surgical Changes.** Writes only to:
- `<repo>/tests/`, `<repo>/fuzz/`, `<repo>/e2e/`, `<repo>/load/`
- `<repo>/.github/workflows/` or `<repo>/.forgejo/workflows/`
- `<repo>/docs/testing/triage.md`
- No writes to `_blocks/` here (that's `compose-solution`'s Phase 6).
- **No duplication with `/test-gen`.** If the user really wants unit-test
generation, Phase 1 detects it and hands off immediately.
---
## References
- [phase-1-intake.md](phase-1-intake.md) · [phase-2-matrix.md](phase-2-matrix.md) · [phase-3-scaffold.md](phase-3-scaffold.md) · [phase-4-ci-wire.md](phase-4-ci-wire.md) · [phase-5-triage.md](phase-5-triage.md)
- `skills/test-gen/SKILL.md` — unit-test generation (happy / edge / error).
Phase 1 hands off there if intake reveals unit-test gap, not matrix gap.
- `_blocks/test-fuzz.md` · `_blocks/test-property.md` · `_blocks/test-load.md` · `_blocks/test-e2e.md` — per-paradigm reference blocks, composable into manifests.
- `_blocks/rule-test-first.md` — TDD / tests-with-code discipline (inherited).
- `skills/compose-solution/SKILL.md` — if you need a NEW block (e.g. mutation-specific), hand off there (Phase 6 block-augment).

View file

@ -0,0 +1,92 @@
# Phase 1 — Intake (language, coverage, critical paths, CI)
One free-text paragraph + one AskUserQuestion multi-part batch.
## 1a — Ask for the testing-gap description
Emit a regular message (NOT AskUserQuestion):
> Describe in one paragraph: what are you testing (project name / stack),
> what gap is `/test-gen` not solving (fuzz? load? E2E? mutation? all?),
> and what failure mode would be worst (prod crash? data loss? latency
> regression? auth bypass?). Reply in one message.
Store verbatim as `INTAKE`.
If `INTAKE` mentions ONLY "unit tests" / "missing tests for function X"
(unit-level gap, not matrix gap), emit:
```
DETECTION: this is a /test-gen task, not /test-matrix.
Handing off to `skills/test-gen/SKILL.md`. Re-run /test-matrix later
when fuzz / property / load / E2E / mutation coverage is needed.
```
…and STOP. Do not proceed.
## 1b — Multi-part intake click (one AskUserQuestion call)
```json
{
"questions": [
{
"question": "Language(s) in scope?",
"header": "Languages",
"multiSelect": true,
"options": [
{"label": "Rust", "description": "cargo-fuzz, proptest, cargo-mutants, oha"},
{"label": "Python", "description": "hypothesis, atheris, mutmut, schemathesis"},
{"label": "JavaScript/TypeScript", "description": "fast-check, StrykerJS, Playwright"},
{"label": "Go", "description": "built-in fuzz (go test -fuzz), gopter, vegeta"},
{"label": "Swift", "description": "SwiftCheck, XCUITest — limited fuzz tooling"},
{"label": "Flutter/Dart", "description": "glados property, flutter integration_test"}
]
},
{
"question": "Baseline unit-test coverage?",
"header": "Coverage",
"multiSelect": false,
"options": [
{"label": "High (≥ 80%)", "description": "Matrix tests layer on top of solid unit base"},
{"label": "Medium (40-80%)", "description": "Run /test-gen in parallel, don't skip unit gaps"},
{"label": "Low (< 40%)", "description": "Strongly recommend /test-gen FIRST fuzz+load on buggy code wastes CI"},
{"label": "Unknown — need to measure", "description": "Phase 3 will add a coverage job before scaffolding"}
]
},
{
"question": "Critical paths (multi-select)?",
"header": "Critical",
"multiSelect": true,
"options": [
{"label": "Auth / session / crypto", "description": "Fuzz + property mandatory on token parsers + signature verify"},
{"label": "Payment / money-in-motion", "description": "E2E + property (invariants: no negative balance, idempotency) mandatory"},
{"label": "Data integrity (DB / serialization)", "description": "Property-based round-trips + migration E2E"},
{"label": "Performance-sensitive (< 100ms SLO)", "description": "Load tests with k6/oha mandatory; set SLO thresholds in CI"},
{"label": "Untrusted-input parsing", "description": "Fuzz mandatory (cargo-fuzz / atheris / jsfuzz)"},
{"label": "User-facing UI flows", "description": "E2E with Playwright on 5-15 critical journeys"}
]
},
{
"question": "CI target?",
"header": "CI",
"multiSelect": false,
"options": [
{"label": "GitHub Actions", "description": "workflow file under .github/workflows/"},
{"label": "Forgejo Actions", "description": "workflow file under .forgejo/workflows/ (kit default — RULE 0.1 compatible)"},
{"label": "Self-hosted / custom", "description": "Emit portable YAML + shell scripts; wire manually"},
{"label": "None — local only", "description": "Generate Makefile / justfile targets, no CI"}
]
}
]
}
```
Store as `LANGS`, `COVERAGE`, `CRITICAL`, `CI`.
## Verify-criterion
- `INTAKE` is non-empty.
- `LANGS` has ≥ 1 entry.
- `CRITICAL` has ≥ 1 entry (zero-critical-path tasks are unit-test-only — redirect to /test-gen).
- `CI` is exactly one value.
- On failure, re-ask the failing input only. Never fall through.

View file

@ -0,0 +1,80 @@
# Phase 2 — Select the test-type × language matrix
Goal: turn `CRITICAL` + `LANGS` into the minimum set of `(test-type, language)`
cells to scaffold. Fewer cells, done well, beats many cells half-wired.
## 2a — Preview auto-recommendation
Apply these rules and emit a preview table in chat (markdown):
| Critical path | Recommended test types |
|---|---|
| Auth / crypto | fuzz + property |
| Payment | property + e2e + mutation |
| Data integrity | property + e2e |
| Performance SLO | load |
| Untrusted parsing | fuzz + property |
| User-facing UI | e2e |
Cross-product with `LANGS` → tentative `MATRIX_RECO`. Example output in chat:
```
Recommended cells (from CRITICAL × LANGS):
[1] fuzz × Rust — rationale: untrusted-parsing + Rust → cargo-fuzz
[2] property × Rust — rationale: data-integrity + Rust → proptest
[3] e2e × TS — rationale: user-facing UI → Playwright
[4] load × Rust — rationale: <100ms SLO oha + k6
[5] mutation × Rust — rationale: payment → cargo-mutants for suite quality
```
Number each cell for the multi-select.
## 2b — Confirm / edit matrix (AskUserQuestion multi-select)
```json
{
"questions": [
{
"question": "Which cells to scaffold this session?",
"header": "Matrix",
"multiSelect": true,
"options": [
{"label": "[1] fuzz × <lang>", "description": "Generate fuzz target + seed corpus + CI nightly job"},
{"label": "[2] property × <lang>", "description": "Add property-test dependency + sample invariant test + regression cache"},
{"label": "[3] e2e × <lang>", "description": "Scaffold Playwright project + 1 page-object example + trace viewer"},
{"label": "[4] load × <lang>", "description": "k6/oha script + SLO thresholds + profile-loop runbook"},
{"label": "[5] mutation × <lang>", "description": "mutmut/cargo-mutants/StrykerJS config + baseline mutation score"},
{"label": "Add a custom cell", "description": "Free-text — e.g. contract tests, chaos tests, visual regression"},
{"label": "Skip a reco", "description": "Drop one of the recommended cells — free-text reason"}
]
}
]
}
```
Options are GENERATED dynamically — one per `MATRIX_RECO` cell PLUS the two
catch-alls (`Add custom`, `Skip`). Substitute `<lang>` literally.
On `Add a custom cell` → single free-text line → regenerate preview →
re-ask. On `Skip a reco` → free-text reason (logged in final report) →
regenerate → re-ask.
## 2c — Budget check (soft cap)
If the final `MATRIX` has > 6 cells, emit a WARNING message (NOT
AskUserQuestion):
> WARNING: <N> cells selected. Scaffolding + CI wiring for each is ~30 min
> of human review per cell. Consider splitting into two sessions (critical
> cells now, rest next week). Continue? Reply "yes" or re-run Phase 2.
Store the final `MATRIX` as a list of `{type, lang, rationale}` objects.
## Verify-criterion
- `MATRIX` has ≥ 1 cell. Zero cells means nothing to do → stop with a
message pointing at `/test-gen`.
- Every cell's `type` ∈ {fuzz, property, e2e, load, mutation, custom}.
- Every cell's `lang``LANGS` (no phantom language).
- User explicitly confirmed the final matrix (not just auto-reco) — the
multi-select click counts as the confirmation.

View file

@ -0,0 +1,74 @@
# Phase 3 — Scaffold config + corpus + fixtures per cell
For each cell in `MATRIX`, generate the minimum-viable scaffold: one
dependency declaration, one example test, one fixture / seed corpus, one
local-run command. No over-scaffolding — just the "it runs" skeleton.
## 3a — Per-cell confirmation (AskUserQuestion, loop over cells)
For each cell, emit ONE AskUserQuestion:
```json
{
"questions": [
{
"question": "Scaffold plan for [<type> × <lang>] — proceed?",
"header": "<type>/<lang>",
"multiSelect": false,
"options": [
{"label": "Proceed with default scaffold", "description": "Apply the default files listed below"},
{"label": "Minimal only (dep + 1 test)", "description": "Skip CI + corpus; just prove the toolchain runs"},
{"label": "Edit one file", "description": "Reply with one free-text path — that file only gets custom content"},
{"label": "Skip this cell", "description": "Drop from MATRIX; next cell"}
]
}
]
}
```
Preview the default scaffold BEFORE asking, so the user sees what "proceed"
means. Example for `[fuzz × Rust]`:
```
Default scaffold for [fuzz × Rust]:
+ fuzz/Cargo.toml — cargo-fuzz manifest
+ fuzz/fuzz_targets/parse.rs — example fuzz_target!(|data: &[u8]| { ... })
+ fuzz/corpus/parse/seed_01 — one hand-picked valid input
+ fuzz/README.md — local-run commands
Cite: _blocks/test-fuzz.md (corpus mgmt + triage + CI rules)
```
## 3b — Per-type default scaffolds
| Cell | Files |
|---|---|
| **fuzz × Rust** | `fuzz/Cargo.toml` (cargo-fuzz), `fuzz/fuzz_targets/<target>.rs`, `fuzz/corpus/<target>/seed_01` |
| **fuzz × Python** | `tests/fuzz/test_fuzz_<target>.py` (atheris OR hypothesis in fuzz mode), `tests/fuzz/corpus/` |
| **fuzz × JS/TS** | `test/fuzz/<target>.fuzz.ts` (fast-check with `numRuns: 10_000`) |
| **property × Rust** | `Cargo.toml` adds `proptest = "*"`, `tests/property_<name>.rs`, `.proptest-regressions` gitkeep |
| **property × Python** | `tests/property/test_<name>.py` with `@given`, `.hypothesis/` gitignored except `examples/` |
| **property × JS/TS** | `test/property/<name>.spec.ts` with `fc.assert(fc.property(...))` |
| **load × any** | `load/k6/baseline.js` with SLO thresholds; `load/README.md` with baseline→profile→fix loop |
| **e2e × any** | `e2e/playwright.config.ts`, `e2e/pages/login.page.ts`, `e2e/tests/login.spec.ts`, `e2e/README.md` |
| **mutation × Rust** | `.cargo-mutants.toml`, first run command in `tests/mutation/README.md` |
| **mutation × Python** | `mutmut` config in `setup.cfg` / `pyproject.toml`, runbook in `tests/mutation/README.md` |
| **mutation × JS/TS** | `stryker.conf.mjs` with sane `timeoutMS`, `mutate` glob narrowed to critical paths |
## 3c — Cite the block
Every scaffold file's header comment references the relevant `_blocks/`
file so the human reviewer can find the discipline rules:
```rust
// See _blocks/test-fuzz.md for corpus management + crash-triage rules.
// This file is the minimum skeleton; real targets expand from here.
```
## Verify-criterion
- For every `MATRIX` cell, user clicked `Proceed` / `Minimal` / explicit `Edit` / `Skip`.
- At least one file is written per non-skipped cell.
- `SCAFFOLDED` is a list of `{cell, files: [paths]}` entries.
- No file overwrites an existing one without explicit confirmation
(a PreWrite check: if path exists, emit a second AskUserQuestion
"overwrite / skip / rename" before writing).

View file

@ -0,0 +1,87 @@
# Phase 4 — CI wiring per cell (artifacts + failure policy)
Each scaffolded cell gets exactly one CI job. Different paradigms have
different failure-budget rules — wire them explicitly, never "all tests
block merge by default".
## 4a — Per-type failure policy (preview)
Emit a table in chat showing the default policy per `MATRIX` cell:
| Cell | Trigger | Duration | Failure policy |
|---|---|---|---|
| fuzz (short) | PR | 60 s per target | **block merge** on any crash |
| fuzz (nightly) | cron | 1-4 h per target | **artifact + issue**, do not block PRs |
| property | PR | ~30 s | **block merge** (failures are real bugs) |
| load (smoke) | PR | 30-60 s | **block merge** if SLO thresholds fail |
| load (full) | nightly / manual | 10-30 min | **artifact + dashboard**, do not block PRs |
| e2e (critical) | PR | 2-5 min | **block merge** (retry×2 max) |
| e2e (full) | nightly | 15-30 min | **artifact + trace**, do not block PRs |
| mutation | weekly / manual | hours | **dashboard + report**, NEVER block PRs |
Rationale written inline: fuzz and load have two lanes (fast smoke on PR,
deep nightly). Mutation testing is too slow to block PRs. E2E uses retries
but keeps the retry count honest (max 2).
## 4b — Confirm CI jobs (AskUserQuestion multi-select)
```json
{
"questions": [
{
"question": "Which CI jobs to generate this session?",
"header": "CI Jobs",
"multiSelect": true,
"options": [
{"label": "fuzz-smoke (PR)", "description": "60s per target per PR; blocks merge on crash"},
{"label": "fuzz-nightly (cron)", "description": "1-4h deep fuzz; artifacts uploaded; non-blocking"},
{"label": "property (PR)", "description": "~30s; blocks merge; PROPTEST_CASES=10000 in CI"},
{"label": "load-smoke (PR)", "description": "30-60s; blocks merge if k6 SLO thresholds fail"},
{"label": "load-full (nightly)", "description": "10-30m; uploads HTML report; non-blocking"},
{"label": "e2e-critical (PR)", "description": "5-15 critical journeys; blocks merge; retry×2 max"},
{"label": "e2e-full (nightly)", "description": "full suite; non-blocking; traces on failure"},
{"label": "mutation (weekly)", "description": "full mutation run; emits HTML + score; never blocks PRs"},
{"label": "coverage gate", "description": "add a coverage-diff gate so /test-gen output is measurable"}
]
}
]
}
```
Options are GENERATED — only show the cell types actually present in
`MATRIX`. Adding `mutation` to options only if at least one `mutation × _`
cell was selected in Phase 2.
## 4c — Write the workflow file(s)
Based on `CI` from Phase 1:
- **GitHub Actions**`.github/workflows/test-matrix.yml` with jobs as
selected. One matrix-strategy job per paradigm (language matrix inside).
- **Forgejo Actions**`.forgejo/workflows/test-matrix.yml` (same schema
as GH Actions, compatible syntax). KeiSeiKit default (RULE 0.1).
- **Self-hosted / custom** → emit portable YAML + a `Makefile` / `justfile`
with the same job commands so humans can wire into any CI.
- **None — local only** → write only `Makefile` / `justfile` targets
(`make fuzz-smoke`, `make load-smoke`, etc.) and a `docs/testing/ci.md`
note explaining how to wire them into CI later.
## 4d — Artifact discipline
Every job uploads one artifact directory, never loose files:
- `fuzz``fuzz/artifacts/` (crash inputs + minimized reproducers)
- `load``load/reports/` (HTML, JSON summaries, Grafana links)
- `e2e``test-results/` (traces, videos, screenshots — Playwright default)
- `mutation``mutation-report/` (HTML + JSON)
Retention: 30 days default; 90 days for nightly + weekly jobs. Never
infinite — CI storage costs compound.
## Verify-criterion
- `CI_JOBS` has ≥ 1 entry (else redirect to local-only Makefile path).
- Workflow file writes to the correct path per `CI` from Phase 1.
- Every job declares explicit `timeout-minutes` (no unbounded runs).
- Every job uploads artifacts on failure (not just on success).
- No job `continue-on-error: true` for PR-blocking lanes.

View file

@ -0,0 +1,70 @@
# Phase 5 — Crash / regression triage workflow
Every matrix paradigm produces artifacts when it fails: fuzz crashes,
shrunk property counterexamples, load-SLO violations, E2E traces,
mutation survivors. Without a triage runbook, those artifacts rot.
This phase writes `docs/testing/triage.md` so the next failure is
actionable in ≤ 15 min.
## 5a — Confirm runbook generation (AskUserQuestion)
```json
{
"questions": [
{
"question": "Write the triage runbook to docs/testing/triage.md?",
"header": "Triage",
"multiSelect": false,
"options": [
{"label": "Yes — full runbook", "description": "Per-paradigm crash / regression flow + artifact paths + commit template"},
{"label": "Yes — minimal", "description": "One-page checklist only; skip per-paradigm deep-dives"},
{"label": "Skip — team already has one", "description": "Finish without writing; final report notes the external link"}
]
}
]
}
```
## 5b — Runbook template (full)
For every selected paradigm in `MATRIX`, emit a section:
```
## <paradigm> failure triage
1. Artifact: <fuzz/artifacts/ | .proptest-regressions | load/reports/ | test-results/ | mutation-report/>
2. Reproduce locally: <exact command from phase-3 scaffold>
3. Minimize: <tmin / shrink / trace-viewer / bisect>
4. Write a failing regression test using the minimized input.
5. Fix root cause (never the symptom — see RULE: No Patching).
6. Re-run the matrix cell. Green = commit with `fix:` + reference artifact SHA.
7. If flaky (not deterministic): quarantine with a ticket, never `retry: 5`.
```
Per-paradigm specifics are pulled from the citing `_blocks/test-*.md`:
- fuzz → `cargo fuzz tmin` / atheris replay flow (block §crash-triage)
- property → commit the shrunk counterexample as a normal unit test
- load → re-baseline after each fix; one variable at a time
- e2e → open `playwright show-trace`; never add `waitForTimeout`
## 5c — Commit template
The runbook ends with a ready-to-copy commit template:
```
fix(<paradigm>): <one-line symptom>
Reproducer: <minimized artifact path + SHA>
Root cause: <1-2 sentences>
Regression test: <path to new permanent test>
See docs/testing/triage.md §<paradigm> for the workflow used.
```
## Verify-criterion
- `TRIAGE_DOC` is set to `docs/testing/triage.md` (or skipped with reason).
- Every `MATRIX` paradigm has a section in the runbook.
- Every section lists artifact path + reproduce command + regression-test
requirement + root-cause discipline + flake policy.
- Commit template present at end of doc.