Steering Ladder — technical overview

Dense summary for readers who already know the area. New here? Start with the explainer. S1.0 has not shipped — most rungs below are pre-registered design intent, not measured results; statuses are tagged inline.

Task-agnostic substrate + lightweight task-conditioned overlay. Compression-ladder negative results expose runtime intervention handles. The intended deployable primitive is soft router-logit biasing on calibration-free routing readouts, not per-domain hard-masks. All empirical work is on allenai/OLMoE-1B-7B-0924.

Status (2026-05-04)

ID	Lever	Status	Calibration
S0	Universal safety map (head/expert redundancy)	partial — S1.0 byproduct landed	head-redundancy map measured; full map planned
S1.0	Head-class hard-zero	REFINE (zero-mask blunt; soft variant queued)	measured (causal screen)
S2.0	Expert hard-mask ceiling	KILL on steering, causal screen PASS (G3+G5+G6+G7)	measured (causal screen)
S2.1	Soft router-logit bias (deployable primitive)	pre-registered, queued	planned
S3.0	Prompt-time routing-readout task detector	pre-registered, cheap	planned
S4.0	Runtime composition (S3.0 ⨂ S2.1)	gated on S2.1+S3.0 SHIP	planned

Thesis

Per-head and per-expert calibration signatures are stable, task-discriminative, and direction-reversing across corpora (measured, OLMoE-only). None of these properties needed probe training or SAE infrastructure — they fall out of compression-ladder forensics. The steering paper is the positive reframe: redundancy maps + soft-bias primitives + cheap task detectors are conjectured to compose into runtime control without per-domain calibration cost. That composition is the program’s central hypothesis, not yet a demonstrated outcome.

Steering Ladder — technical overview

Status (2026-05-04)

Pages

Thesis