- Add insta + tempfile to _assembler/Cargo.toml [dev-dependencies]. - Create tests/common/mod.rs with helpers: seed_tempdir (copies fixtures into an isolated AGENT_ROOT), run_assemble (invokes the built binary via std::process::Command), and assemble_one (end-to-end single-manifest helper). - Seed tests/fixtures/ with the 4 manifests covered by the golden snapshots (code-implementer, researcher, cost-guardian, patent-compliance) and the 7 blocks they reference (baseline, evidence-grading, memory-protocol, rule-pre-dev-gate, rule-test-first, rule-error-budget, rule-double-audit). Binary-only crate (no lib target), so integration tests invoke the assemble binary in-process instead of calling internal functions. This exercises the full main.rs I/O + validator + assembler pipeline end-to-end, which is exactly what the determinism claim covers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
857 B
857 B
EVIDENCE GRADING
Every major claim must carry a grade:
| Grade | Name | Criteria |
|---|---|---|
| E1 | Fact | Confirmed in production OR primary source (official docs, API response, pricing page) |
| E2 | Verified | Reproducible in tests/benchmarks. Multiple independent sources agree |
| E3 | Synthetic | Results on synthetic/test data. Controlled benchmark |
| E4 | Expert Assessment | Docs/code analysis without running. Extrapolation. Literature consensus |
| E5 | Hypothesis | Theoretical assumption. Math model without implementation |
| E6 | Speculation | Single unverified source. Outdated data (>6mo) |
Rules: architectural decision → E1-E2. Financial (compute) → ONLY E1. Data >6mo without re-verification → grade −1. Single source → max E4. Own benchmark without external confirm → max E3.