Methodology

Before any lever is built, its experiment is written down in full — hypothesis, metric, hyperparameters, and the keep/kill rule — and locked. Only then is it run. This is pre-registration: it stops a result from being reinterpreted after the fact to look like a success. Every rung on the ladder is gated this way.

Calibration. Pre-registration is how the program keeps measured, planned, and conjectured apart. A locked decision rule plus a closed run = measured. A locked decision rule with no run yet = planned. Most rungs are currently the latter.

Each steering lever is gated by a pre-registered prototype experiment with mechanism traps and decision rule before paper-week investment.

Pre-reg structure (per lever)

H0.md — falsifiable hypothesis + KILL-world prediction
tau.json — sha-self-locked hyperparameters (axes, prompts, conditions)
power.md — effect-size and sample-size argument
decision.json — populated post-run with gate evaluations
postmortem.md — written on decision

Trap-and-deepen

Every lever pre-reg includes ≥2 mechanism traps:

Random null: same mask budget, randomized assignment
Catastrophic sanity: maximal intervention; must produce qualitative pathology
Repro sanity: baseline reproduces dense forward pass at machine precision

Trap failures kill the gate even if primary metric passes.

Lean role

Lean theorems act as boundary of trust. These are machine-checked proofs (in Lean 4) — a different and stronger kind of claim than the empirical results: a proved theorem is true under its stated assumptions, full stop. Each lever cites existing correctness theorems (gauge_invariance, topk_route_stability, gap_preserved_under_perturbation) as a trust boundary; it does not carry its own new proof obligation. So “the lever cites a proved theorem” must not be read as “the lever’s steering claim is proved” — the steering claims remain empirical (measured or planned). Loose-first, tighten-later if a lever ships.

Loop ordering

experimental design → run → interpret via Lean modeling → design next. Skipping the Lean-modeling step is rejected as empirical autopilot.