From 8b6ee371347fc9b1b91a21249b5fcd2d1a689fee Mon Sep 17 00:00:00 2001 From: Parfii-bot Date: Tue, 21 Apr 2026 20:32:45 +0800 Subject: [PATCH] =?UTF-8?q?feat(blocks):=204=20testing=20blocks=20?= =?UTF-8?q?=E2=80=94=20fuzz/property/load/e2e?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds four behavioural blocks for testing paradigms beyond unit tests (test-gen already covers unit-test generation): - test-fuzz.md — cargo-fuzz/hypothesis/fast-check corpus + triage + CI - test-property.md — proptest/hypothesis/fast-check invariants + shrinking - test-load.md — k6/vegeta/oha/hyperfine baseline→profile→fix loop + SLO - test-e2e.md — Playwright page-objects + trace viewer + flake policy Each block 32-53 LOC (within 60-LOC block cap). Single-concern, composable via _manifests/*.toml like any other _blocks/*.md. Tooling cited at [E4] based on official docs; version pinning deferred to consumers. Co-Authored-By: Claude Opus 4.7 (1M context) --- _blocks/test-e2e.md | 53 ++++++++++++++++++++++++++++++++++++++++ _blocks/test-fuzz.md | 32 ++++++++++++++++++++++++ _blocks/test-load.md | 48 ++++++++++++++++++++++++++++++++++++ _blocks/test-property.md | 36 +++++++++++++++++++++++++++ 4 files changed, 169 insertions(+) create mode 100644 _blocks/test-e2e.md create mode 100644 _blocks/test-fuzz.md create mode 100644 _blocks/test-load.md create mode 100644 _blocks/test-property.md diff --git a/_blocks/test-e2e.md b/_blocks/test-e2e.md new file mode 100644 index 0000000..e207dfe --- /dev/null +++ b/_blocks/test-e2e.md @@ -0,0 +1,53 @@ +# TEST — End-to-end (Playwright browser automation) + +E2E tests drive a real browser against a real deployed stack and assert user-visible behaviour. Slow + flaky by nature — so discipline matters more than count. One reliable E2E beats ten flaky ones. + +**Default tool:** `Playwright` (Microsoft, TS/JS/Python/.NET/Java bindings). Preferred over Cypress because: multi-browser (Chromium / Firefox / WebKit), parallel by default, trace viewer (time-travel debugger), auto-waiting for elements, network interception built-in. [E4, playwright.dev] + +Cypress is the runner-up; use only if team already owns it. `Selenium` is legacy — avoid for new E2E. + +**Scope:** +- E2E = **critical user journeys only** (login, checkout, primary CRUD flow, signup). Target ~5-15 tests, not 500. +- Everything else (form validation, error states, edge cases) → unit + integration + component tests. +- Rule: if a regression here would be a production incident, it's an E2E candidate. + +**Page Object pattern (mandatory):** +```ts +class LoginPage { + constructor(private page: Page) {} + async goto() { await this.page.goto('/login'); } + async login(user: string, pass: string) { + await this.page.getByLabel('Email').fill(user); + await this.page.getByLabel('Password').fill(pass); + await this.page.getByRole('button', { name: 'Sign in' }).click(); + } +} +``` +Selectors live in the page object, never in the test. When the UI changes, ONE file updates. + +**Selector discipline:** +- Prefer `getByRole` / `getByLabel` / `getByText` (accessibility-anchored, survive CSS refactors). +- Fallback to `data-testid` attributes added purely for tests. +- AVOID CSS class selectors, XPath, nth-child — they break on every style change. + +**Test isolation:** +- Each test gets a clean auth state via `storageState` fixtures (login once per project, reuse the cookie jar). +- Each test uses a fresh data scope — either a disposable test tenant, a UUID prefix, or DB truncation in a `beforeEach`. +- NEVER depend on test ordering. Parallel-safe by construction. + +**CI headless + tracing:** +- Headless by default, headed only when debugging locally (`--headed --debug`). +- Enable trace on retry: `trace: 'on-first-retry'` — zero overhead on green runs, full forensic on flakes. +- Upload `test-results/` as CI artifact. Open traces with `npx playwright show-trace trace.zip`. +- Video + screenshots on failure: `video: 'retain-on-failure'`, `screenshot: 'only-on-failure'`. + +**Flake policy:** +- Retry **at most twice** in CI. If a test retries often, it's a real bug — either in the SUT or the test. +- Quarantine flaky tests (`test.skip()` with a tracked ticket), never silently `retry: 5`. +- Root-cause flakes with the trace viewer, not by adding `waitForTimeout` (always a smell). + +**Forbidden:** +- `page.waitForTimeout(ms)` — use auto-waiting locators or explicit `expect(...).toBeVisible()` polls. +- Running E2E against production without a dedicated test account and a rate limit. +- E2E-testing behaviour already covered by a unit/integration test (slow duplication). +- Hardcoded sleeps, hardcoded URLs, hardcoded user credentials in test files (use fixtures + env vars). diff --git a/_blocks/test-fuzz.md b/_blocks/test-fuzz.md new file mode 100644 index 0000000..5f382c6 --- /dev/null +++ b/_blocks/test-fuzz.md @@ -0,0 +1,32 @@ +# TEST — Fuzzing (input-space exploration) + +Fuzzing throws semi-random inputs at a target to find crashes, panics, hangs, and undefined behaviour the unit-test author never imagined. Complements `test-gen` (happy/edge/error) — fuzz owns the unknown-unknown surface. + +**When to fuzz:** parsers, deserializers, protocol handlers, auth/crypto boundaries, any function that accepts untrusted bytes or strings. NOT business logic with well-defined inputs (use property tests instead). + +**Per-language tool (default):** +- **Rust:** `cargo-fuzz` (libfuzzer-sys backend) — `cargo fuzz init`, then `fuzz_target!(|data: &[u8]| { my_parser(data); })`. Requires nightly. Harness lives in `fuzz/fuzz_targets/`. [E4, official: https://rust-fuzz.github.io/book/] +- **Python:** `hypothesis` in fuzz mode (`@given` + `HealthCheck.too_slow` disabled) for structured inputs; `atheris` (Google, libfuzzer bindings) for bytes-in fuzzing. [E4, hypothesis.readthedocs.io / github.com/google/atheris] +- **JS/TS:** `fast-check` with `fc.assert` using `numRuns: 10_000+` for fuzz-volume runs; `jsfuzz` for libFuzzer-style bytes fuzzing. [E4, fast-check.dev] + +**Corpus management:** +- Seed corpus = hand-picked valid inputs (1-10 files). Place under `fuzz/corpus//`. +- Fuzzer mutates corpus → keeps inputs that hit new coverage → corpus grows. +- Commit corpus to git (gitignore `fuzz/artifacts/`). Treat as test fixture. + +**Crash triage:** +1. Fuzzer dumps crash input under `fuzz/artifacts//crash-`. +2. Reproduce: `cargo fuzz run fuzz/artifacts//crash-`. +3. Minimize: `cargo fuzz tmin ` — shrinks to minimal reproducer. +4. Write a regression unit test using the minimized input BEFORE fixing the bug. Regression test is permanent; fuzz corpus is ephemeral. + +**CI integration (budget-aware):** +- Short CI run: 60s per target on every PR. Catches regressions, not deep bugs. +- Nightly run: 1-4h per target on schedule. Upload crashes as artifacts. +- OSS-Fuzz (free for OSS): submit a `project.yaml` + Dockerfile + build script; Google runs fuzzing on their infra. [E4, google.github.io/oss-fuzz] + +**Forbidden:** +- Fuzzing without a crash-reproducer harness (crashes become irreproducible). +- Running fuzzer without `cargo fuzz tmin` / equivalent — full-size crashes waste reviewer time. +- Committing `fuzz/artifacts/` (binary crash bodies, repo bloat). +- Treating a fuzz hit as "flaky" — every crash is a bug until minimized + explained. diff --git a/_blocks/test-load.md b/_blocks/test-load.md new file mode 100644 index 0000000..7fcdbf0 --- /dev/null +++ b/_blocks/test-load.md @@ -0,0 +1,48 @@ +# TEST — Load / performance testing (baseline → profile → fix) + +Load tests answer: "how much traffic does this system handle before SLO violation?" Not "does it work" (unit/integration) but "does it stay up under N RPS for T minutes with p99 < X ms". The loop is **baseline → profile → fix → re-baseline**, never "run once and ship". + +**Tool choice (default):** +- **`k6`** (Grafana, JS scripting) — best for HTTP/REST/WS APIs with scripted scenarios + thresholds; built-in SLO assertions; Docker-friendly. [E4, k6.io] +- **`vegeta`** (Go, CLI) — simplest constant-rate HTTP attacker; great for flat-load smoke tests; pipes into plots. [E4, github.com/tsenart/vegeta] +- **`oha`** (Rust) — modern `hey` replacement, good for quick local baselines, HTTP/2 + HTTP/3. [E4, github.com/hatoof/oha] +- **`hyperfine`** (Rust) — microbenchmark CLI for single commands / binaries; NOT a web load tool. Use for build-time, cold-start, compile-speed measurements. [E4, github.com/sharkdp/hyperfine] + +**SLO definition (write BEFORE running):** +1. **Latency:** p50 < A ms, p95 < B ms, p99 < C ms (p99 is the user-felt number). +2. **Throughput:** sustain N RPS for T minutes without error budget burn. +3. **Error rate:** < 0.1% 5xx, < 1% 4xx (excluding user errors). +4. **Resource:** CPU < 70%, memory < 80% of instance, no OOM kills. + +Without SLOs written down, "the test passes" is meaningless. + +**The loop:** +1. **Baseline:** lowest realistic load (10 RPS for 1 min). Record latency histogram, CPU, memory. This is the "no-load" floor. +2. **Ramp:** step-up load (10 → 50 → 100 → 200 RPS, 2 min each). Find the knee — where p99 doubles or errors appear. +3. **Profile at the knee:** attach `perf` / `pprof` / `tokio-console` / `flamegraph`. Identify top hot function. +4. **Fix** the hottest contributor (add index, cache, pooling, algorithm swap). ONE change at a time. +5. **Re-baseline** at the same step-up. Knee should move right. If not, the fix was wrong → revert, reprofile. + +**k6 threshold example (copy into CI):** +```js +export const options = { + stages: [{ duration: '2m', target: 100 }], + thresholds: { + http_req_duration: ['p(95)<500', 'p(99)<1000'], + http_req_failed: ['rate<0.01'], + }, +}; +``` +If thresholds fail, k6 exits non-zero → CI job red. + +**CI integration:** +- Short smoke load test on every PR (30s, low RPS, strict thresholds). Catches obvious regressions. +- Nightly full load test on a dedicated environment, not shared prod. +- Publish HTML report (k6 cloud / Grafana) as a CI artifact. + +**Forbidden:** +- Load-testing against production without a killswitch + comms. +- Running without SLOs defined in the test file itself (no "looks ok" verdicts). +- Running multiple load tests in parallel against the same target (interferes with each other). +- Changing two things between runs ("I added an index AND a cache") — can't attribute the delta. +- Ignoring CPU/memory — latency alone hides resource leaks that kill you at 24h. diff --git a/_blocks/test-property.md b/_blocks/test-property.md new file mode 100644 index 0000000..0e98d64 --- /dev/null +++ b/_blocks/test-property.md @@ -0,0 +1,36 @@ +# TEST — Property-based testing (invariants + shrinking) + +A property test asserts an invariant — a statement true for every valid input — and the framework generates hundreds of inputs automatically. On failure, it shrinks the input to the minimal reproducer. Complements unit tests (which assert on hand-picked examples) and fuzz (which throws bytes at a boundary). + +**When to use:** pure functions with stable contracts — parsers (`encode ∘ decode = id`), data structures (insert-then-lookup = hit), serializers, math, state machines with invariants. NOT for side-effectful handlers (use integration tests). + +**Per-language tool (default):** +- **Rust:** `proptest` — `proptest! { fn roundtrip(s in "\\PC*") { assert_eq!(decode(encode(&s)), s); } }`. Supports stateful tests via `proptest-state-machine`. Prefer over `quickcheck` (proptest has better shrinking + regression file). [E4, proptest.rs] +- **Python:** `hypothesis` — `@given(st.integers())` / `@given(st.text())`. Stateful: `hypothesis.stateful.RuleBasedStateMachine`. Regression examples auto-saved under `.hypothesis/`. [E4, hypothesis.readthedocs.io] +- **JS/TS:** `fast-check` — `fc.assert(fc.property(fc.string(), s => decode(encode(s)) === s))`. Stateful: `fc.commands`. [E4, fast-check.dev] + +**Writing a good property:** +1. **Round-trip:** `f⁻¹(f(x)) == x` (encode/decode, parse/print, serialize/deserialize). +2. **Idempotence:** `f(f(x)) == f(x)` (normalize, sort, dedupe). +3. **Invariant:** `op(x)` preserves property P (insert preserves size+1; sort preserves multiset). +4. **Metamorphic:** `f(g(x)) == h(f(x))` (commute operations). +5. **Comparison with oracle:** `my_fast_impl(x) == simple_slow_impl(x)` for all x. + +**Shrinking:** +- When a test fails, framework automatically shrinks the counterexample to the smallest input reproducing the failure. +- Commit the shrunk example as a regression unit test. Do NOT rely on the `.proptest-regressions` / `.hypothesis/examples` cache alone — commit it, but also pin the hit in a normal test. + +**Stateful tests:** +- Model a state machine: commands + preconditions + postconditions + model state. +- Framework generates valid command sequences, applies to SUT and model, asserts equality. +- Use for data structures, caches, stateful APIs, small DSLs. + +**Config discipline:** +- `cases = 1024` default; bump to 10_000 for CI; lower to 64 for quick local iteration. +- Seed explicitly for reproducibility in CI logs (`PROPTEST_CASES=10000 PROPTEST_SEED=42`). + +**Forbidden:** +- Property assertions that just restate the implementation (`f(x) == f(x)`). +- Disabling shrinking ("it took too long") — shrunk output is the whole point. +- Ignoring a single failing case as "flaky" — properties don't flake; the input found a bug. +- Mixing property tests with external services (DB, network) — properties must be deterministic.