KeiSeiKit-1.0/_assembler/tests/fixtures/_blocks/evidence-grading.md
Parfii-bot e3053df706 test(assembler): add insta dev-dep and fixture-loading helpers
- Add insta + tempfile to _assembler/Cargo.toml [dev-dependencies].
- Create tests/common/mod.rs with helpers: seed_tempdir (copies
  fixtures into an isolated AGENT_ROOT), run_assemble (invokes the
  built binary via std::process::Command), and assemble_one
  (end-to-end single-manifest helper).
- Seed tests/fixtures/ with the 4 manifests covered by the golden
  snapshots (code-implementer, researcher, cost-guardian,
  patent-compliance) and the 7 blocks they reference (baseline,
  evidence-grading, memory-protocol, rule-pre-dev-gate,
  rule-test-first, rule-error-budget, rule-double-audit).

Binary-only crate (no lib target), so integration tests invoke the
assemble binary in-process instead of calling internal functions.
This exercises the full main.rs I/O + validator + assembler pipeline
end-to-end, which is exactly what the determinism claim covers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 04:15:04 +08:00

857 B
Raw Blame History

EVIDENCE GRADING

Every major claim must carry a grade:

Grade Name Criteria
E1 Fact Confirmed in production OR primary source (official docs, API response, pricing page)
E2 Verified Reproducible in tests/benchmarks. Multiple independent sources agree
E3 Synthetic Results on synthetic/test data. Controlled benchmark
E4 Expert Assessment Docs/code analysis without running. Extrapolation. Literature consensus
E5 Hypothesis Theoretical assumption. Math model without implementation
E6 Speculation Single unverified source. Outdated data (>6mo)

Rules: architectural decision → E1-E2. Financial (compute) → ONLY E1. Data >6mo without re-verification → grade 1. Single source → max E4. Own benchmark without external confirm → max E3.