KeiSeiKit-1.0/_assembler/tests/fixtures/_blocks/evidence-grading.md
Parfii-bot e3053df706 test(assembler): add insta dev-dep and fixture-loading helpers
- Add insta + tempfile to _assembler/Cargo.toml [dev-dependencies].
- Create tests/common/mod.rs with helpers: seed_tempdir (copies
  fixtures into an isolated AGENT_ROOT), run_assemble (invokes the
  built binary via std::process::Command), and assemble_one
  (end-to-end single-manifest helper).
- Seed tests/fixtures/ with the 4 manifests covered by the golden
  snapshots (code-implementer, researcher, cost-guardian,
  patent-compliance) and the 7 blocks they reference (baseline,
  evidence-grading, memory-protocol, rule-pre-dev-gate,
  rule-test-first, rule-error-budget, rule-double-audit).

Binary-only crate (no lib target), so integration tests invoke the
assemble binary in-process instead of calling internal functions.
This exercises the full main.rs I/O + validator + assembler pipeline
end-to-end, which is exactly what the determinism claim covers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 04:15:04 +08:00

14 lines
857 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# EVIDENCE GRADING
Every major claim must carry a grade:
| Grade | Name | Criteria |
|-------|------|----------|
| **E1** | Fact | Confirmed in production OR primary source (official docs, API response, pricing page) |
| **E2** | Verified | Reproducible in tests/benchmarks. Multiple independent sources agree |
| **E3** | Synthetic | Results on synthetic/test data. Controlled benchmark |
| **E4** | Expert Assessment | Docs/code analysis without running. Extrapolation. Literature consensus |
| **E5** | Hypothesis | Theoretical assumption. Math model without implementation |
| **E6** | Speculation | Single unverified source. Outdated data (>6mo) |
Rules: architectural decision → E1-E2. Financial (compute) → ONLY E1. Data >6mo without re-verification → grade 1. Single source → max E4. Own benchmark without external confirm → max E3.