- Add insta + tempfile to _assembler/Cargo.toml [dev-dependencies]. - Create tests/common/mod.rs with helpers: seed_tempdir (copies fixtures into an isolated AGENT_ROOT), run_assemble (invokes the built binary via std::process::Command), and assemble_one (end-to-end single-manifest helper). - Seed tests/fixtures/ with the 4 manifests covered by the golden snapshots (code-implementer, researcher, cost-guardian, patent-compliance) and the 7 blocks they reference (baseline, evidence-grading, memory-protocol, rule-pre-dev-gate, rule-test-first, rule-error-budget, rule-double-audit). Binary-only crate (no lib target), so integration tests invoke the assemble binary in-process instead of calling internal functions. This exercises the full main.rs I/O + validator + assembler pipeline end-to-end, which is exactly what the determinism claim covers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
14 lines
857 B
Markdown
14 lines
857 B
Markdown
# EVIDENCE GRADING
|
||
|
||
Every major claim must carry a grade:
|
||
|
||
| Grade | Name | Criteria |
|
||
|-------|------|----------|
|
||
| **E1** | Fact | Confirmed in production OR primary source (official docs, API response, pricing page) |
|
||
| **E2** | Verified | Reproducible in tests/benchmarks. Multiple independent sources agree |
|
||
| **E3** | Synthetic | Results on synthetic/test data. Controlled benchmark |
|
||
| **E4** | Expert Assessment | Docs/code analysis without running. Extrapolation. Literature consensus |
|
||
| **E5** | Hypothesis | Theoretical assumption. Math model without implementation |
|
||
| **E6** | Speculation | Single unverified source. Outdated data (>6mo) |
|
||
|
||
Rules: architectural decision → E1-E2. Financial (compute) → ONLY E1. Data >6mo without re-verification → grade −1. Single source → max E4. Own benchmark without external confirm → max E3.
|