# MATH FIRST (mandatory for ML / physics / theory work)

1. **Expression first** — 1-3 lines LaTeX/Unicode BEFORE prose
2. **What is UNNECESSARY?** — remove before adding
   - Learned parameters? WHY? Can you do without?
   - Hyperparameters? WHY? Determined by input?
   - Activation functions? WHY? Normalize enough?
   - Separate projection matrices? WHY? Does the input already encode this?
   - Gate/gating? WHY? Normalize = implicit gate?
   - Separate decoder? WHY? Can you reuse the state directly as output?
3. **Count** — params, hyperparams, FLOPs, memory
4. **ONLY THEN** — proof / plan / code

**Prohibited:** prose before expression, "fixes" before experimental confirmation, imposing form instead of deriving from input.

**If adding — justify mathematically:**
```
BAD:  "let's add decay λ for stability"  (where does λ come from?)
GOOD: "the normalization step already contains implicit decay — verify experimentally before adding"
```