Steering Ladder — technical overview

Dense summary for readers who already know the area. New here? Start with the explainer. S1.0 has not shipped — most rungs below are pre-registered design intent, not measured results; statuses are tagged inline.

Task-agnostic substrate + lightweight task-conditioned overlay. Compression-ladder negative results expose runtime intervention handles. The intended deployable primitive is soft router-logit biasing on calibration-free routing readouts, not per-domain hard-masks. All empirical work is on allenai/OLMoE-1B-7B-0924.

Status (2026-05-04)

ID Lever Status Calibration
S0 Universal safety map (head/expert redundancy) partial — S1.0 byproduct landed head-redundancy map measured; full map planned
S1.0 Head-class hard-zero REFINE (zero-mask blunt; soft variant queued) measured (causal screen)
S2.0 Expert hard-mask ceiling KILL on steering, causal screen PASS (G3+G5+G6+G7) measured (causal screen)
S2.1 Soft router-logit bias (deployable primitive) pre-registered, queued planned
S3.0 Prompt-time routing-readout task detector pre-registered, cheap planned
S4.0 Runtime composition (S3.0 ⨂ S2.1) gated on S2.1+S3.0 SHIP planned

Pages

Thesis

Per-head and per-expert calibration signatures are stable, task-discriminative, and direction-reversing across corpora (measured, OLMoE-only). None of these properties needed probe training or SAE infrastructure — they fall out of compression-ladder forensics. The steering paper is the positive reframe: redundancy maps + soft-bias primitives + cheap task detectors are conjectured to compose into runtime control without per-domain calibration cost. That composition is the program’s central hypothesis, not yet a demonstrated outcome.