Methodology
Before any lever is built, its experiment is written down in full — hypothesis, metric, hyperparameters, and the keep/kill rule — and locked. Only then is it run. This is pre-registration: it stops a result from being reinterpreted after the fact to look like a success. Every rung on the ladder is gated this way.
Calibration. Pre-registration is how the program keeps measured, planned, and conjectured apart. A locked decision rule plus a closed run = measured. A locked decision rule with no run yet = planned. Most rungs are currently the latter.
Each steering lever is gated by a pre-registered prototype experiment with mechanism traps and decision rule before paper-week investment.
Pre-reg structure (per lever)
H0.md— falsifiable hypothesis + KILL-world predictiontau.json— sha-self-locked hyperparameters (axes, prompts, conditions)power.md— effect-size and sample-size argumentdecision.json— populated post-run with gate evaluationspostmortem.md— written on decision
Trap-and-deepen
Every lever pre-reg includes ≥2 mechanism traps:
- Random null: same mask budget, randomized assignment
- Catastrophic sanity: maximal intervention; must produce qualitative pathology
- Repro sanity: baseline reproduces dense forward pass at machine precision
Trap failures kill the gate even if primary metric passes.
Lean role
Lean theorems act as boundary of trust. These are machine-checked proofs
(in Lean 4) — a different and stronger kind of claim than the empirical results: a
proved theorem is true under its stated assumptions, full stop. Each lever cites
existing correctness theorems (gauge_invariance, topk_route_stability,
gap_preserved_under_perturbation) as a trust boundary; it does not carry its
own new proof obligation. So “the lever cites a proved theorem” must not be read as
“the lever’s steering claim is proved” — the steering claims remain empirical
(measured or planned). Loose-first, tighten-later if a lever ships.
Loop ordering
experimental design → run → interpret via Lean modeling → design next.
Skipping the Lean-modeling step is rejected as empirical autopilot.