experiment: implement masked br_table dispatch for variant switches (n ≥ 4 arms)#5927
experiment: implement masked br_table dispatch for variant switches (n ≥ 4 arms)#5927
br_table dispatch for variant switches (n ≥ 4 arms)#5927Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
At compile time, find a bitmask M (and shift S = ctz(M)) such that
(hash_i & M) >> S are all distinct for the n variant tags. Then emit:
local.get $tag_field
i32.const M ;; compile-time constant
i32.and
i32.const S ;; omitted when S = 0
i32.shr_u ;; omitted when S = 0
br_table ... ;; O(1) dispatch
compared to the previous O(n) linear comparison chain. The break-even
is at n = 3 (worst case) / n = 3 (average), but n = 1 and n = 2 are
already handled by single_case / simplify_cases, so the new path is
a strict win for every applicable case (n ≥ 4 with all TagP arms).
Mask-finding uses Gosper's hack to iterate candidate masks in order
of increasing value, ensuring compact (low-index) masks are tried
first and table sizes remain small. Threshold: max(64, 4n).
Also: add test/run/variant_switch.mo covering 4-arm, 7-arm and
payload-carrying variant switches; add "same-body arm merging" to
the plan as a future optimisation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
br_table dispatch for variant switches (n ≥ 4 arms)
…d.ml) Same optimisation as the classical backend: n ≥ 4 all-TagP variant switches get masked br_table dispatch instead of a linear comparison chain. The EOP backend uses int64 hashes throughout, so the helpers use Int64 arithmetic. All bench tests now pass (instruction counts updated to reflect the savings). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
br_table always expects an i32 operand; the EOP backend operates on i64 values, so add an i64.to_i32 (WrapI64) conversion after the masked/shifted tag before the br_table instruction. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…imisation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Benchmarks the masked br_table dispatch by traversing a synthetic ~700-node expression tree (constructors: Var, Lit, App, Lam, Let, LetRec, Case, Con) 10_000 times and reporting instruction counts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs in compile_enhanced.ml's masked br_table dispatch for EOP: 1. iter_masks_with_popcount iterated all 64-bit k-bit patterns. EOP variant hashes are extend_i32_u values (bits 0-31 only), so masks with bits ≥ 32 are useless. For k≥4 this blew up: C(64,4)=635k vs C(32,4)=36k, causing the compiler to hang. Fix: cap the loop at k > 32 / mask ≥ 2^32, matching the classical int32 backend. 2. Nat64.of_int64 mask crashed when mask was negative (bit 63 set). Gosper's hack can produce such masks before the new early-exit terminates. Fix: replace with a local ctz64 that works for any non-zero int64 by isolating the lowest set bit via a shift loop. Also adds test/bench/variant-switch.mo: GHC-Core-like 8-arm expression interpreter bench (Var/Lit/App/Lam/Let/LetRec/Case/Con) exercising the hot-path switch dispatch at 10k iterations over a 24-node tree. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e s)
Cases 0 and 1 previously returned leaf nodes (#Var "x", #Lit d) ignoring
the sub-tree s, causing the tree to reset every 8 levels and stay at ≤24
nodes regardless of depth. Replace with #App(#Var "x", s) and #Lam("k", s)
so every level wraps s; #Case at d%8=6 doubles s, giving exponential growth.
build 15 now produces 80 nodes (800k total/10k iterations, ~70M instructions)
vs the old 24-node cycle.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add 2^10 iteration cutoff to iter_masks_with_popcount in both backends to prevent O(C(32,k)) compile-time blowup on switches where no injective mask exists (e.g. nested-pattern switches with duplicate outer labels) - Add distinct-labels guard to the SwitchE br_table branch: only fire when all outer TagP labels are unique, as the known_tag_pat arm codes (outer tag check stripped) are only correct for flat variant dispatch - Document both issues and the deeper fix (None fallback should fall through to regular handler) in the plan Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…r dispatch) Move find_variant_mask into the `when` guard in both backends so that a None result (cutoff reached, no valid mask, or duplicate labels) causes the guard to fail and OCaml falls through to the regular SwitchE handler with full patterns. This eliminates: - The broken None branch that used known_tag_pat arms (no outer tag check), which caused incorrect dispatch (e.g. debug_show on 12-arm Action_ type routed #RegisterKnownNeuron to #AddOrRemoveNodeProvider) - The distinct-labels workaround (now fully subsumed: duplicate labels make is_injective fail for every mask, so find_variant_mask returns None) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
br_table dispatch for variant switches (n ≥ 4 arms)br_table dispatch for variant switches (n ≥ 4 arms)
…-31 masks
- Raise Gosper iteration cutoff from 2^10 to 2^16 in both backends,
enabling larger variant types (e.g. 12-arm NNS Action_ type) to find
a compact mask where the 2^10 limit would time out
- Change classical backend loop guard from `!m <> 0l` to `!m > 0l`:
stops at zero (wrapped past 2^32) AND at negative int32 (bit 31 set).
Motoko hashes are 31-bit (Mo_types.Hash.hash always clears bit 31),
so masks with bit 31 set are irrelevant and Nat32.of_int32 would crash
on them with Invalid_argument("value out of bounds")
- EOP backend already caps at 0x1_0000_0000 (32-bit range); 31-bit cap
is implicitly safe there since hashes fit in 31 bits
- Document resolution of plan item 3 (31-bit vs 32-bit hashes)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Document a future optimisation replacing the single Gosper stream with concurrent strategy generators (MaskShift batched by bit-window, ModPrime, RotLow) merged round-robin and ranked by cycle-cost estimate. This avoids any single strategy's worst case dominating compile time, and subsumes the Pre-shortening section. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a 9th variant `#Prim : Char` for primitive operations (needed for upcoming `fib` benchmark). Wrapped as `#App (#Prim '+', s)` in `build` so the recursive sub-tree is preserved and node count stays ~82. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Encodes fib over Peano naturals using #LetRec/#Lam/#Case/#Con/#Prim. Currently unused (_fibCore); will serve as the benchmark program for an upcoming eval function. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Avoids computing #App (#Prim '-', #Var "n") twice in the recursive arm. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Also removes _ prefix from fibCore now that it is used. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…both Adds: - Val/Env runtime types (Peano naturals via #VCon) - Direct AST interpreter: eval : (Expr, Env) -> Val - Finally-tagless machinery: FT = Env -> Val, Symantics record, transform - evalSem: the evaluating Symantics (record of closures, no variant dispatch) - evalBench: runs fib(7) 100x via eval and via FT, reports instruction counts Result: fib(7)=13 correct for both; FT ~5% fewer instructions than direct eval. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Benchmarks the `transform(evalSem, fibCore)` step (100 iterations) and verifies correctness via `fib7_xform`. Updates `.ok` with new instruction counts including `instr_transform`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary Adds `test/bench/variant-switch.mo` — a small GHC-Core-like interpreter that exercises 9-arm variant-switch dispatch in several shapes. Serves as a reference point (baseline on `master`) for the dispatch-optimisation work on #5927 (masked `br_table`) and any follow-ups. Actor methods: - `go` — top-level `size tree + size fibCore`, ×10k. - `evalBench` — `fib(7)` via direct AST eval, compiled finally-tagless form, and the AST→FT transform itself; ×100 each. - `weekdayBench` — `isWeekend` (7 explicit arms) vs `isWeekendOr` (same dispatch via `or`-patterns); ×10k over a 7-arm `Weekday` variant. - `getPerfData` — reports `rts_lifetime_instructions`. ## Master baseline | Metric | Instructions | |---|---| | `size tree + fibCore` (×10k) | 137,590,321 | | `eval fib(7)` AST (×100) | 24,509,348 | | `eval fib(7)` FT (×100) | 21,536,248 | | AST→FT transform (×100) | 1,189,148 | | `isWeekend` (×10k × 7 arms) | 10,010,321 | | `isWeekendOr` (×10k × 7 arms) | 11,050,321 | With #5927 applied, the explicit-arm dispatch numbers drop materially; or-patterns currently don't benefit (separate follow-up). ## Test plan - [x] `make -C test/bench variant-switch.only` passes on `master`. - [x] drun output captured in `test/bench/ok/variant-switch.drun-run.ok`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors isWeekend exactly but collapses the 5 weekday cases into one or-pattern arm and the 2 weekend cases into another. Exercises the same-body arm-merging path noted in 9d7c498. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
V1 (IR-level, `SwitchE` with `TagP` arms) is blind to or-patterns: the
semantically-identical `isWeekday` (7 flat arms) and `isWeekdayOr`
(2-arm or-pattern) end up with different IR shapes, so only the
former hits the `br_table` dispatch path.
Revise "Where to Apply" to introduce Option C: recognise the
tag-hash-compare fragments at the `patternCode` EDSL level, where
both shapes have already elaborated to the same `(^^^)` chain.
Architecture is three handlers using OCaml 5.3 algebraic effects
(same machinery as ConstTrack Phase 3):
- Recognizer: perform `Variant_arm {hash; body_code}` at each
tag-compare site.
- Strategy query: at `compile_switch` entry, collect the effect
trace and perform `Dispatch_strategy tag_set` to let the
enclosing handler choose a plan (MaskShift / ModPrime / RotLow
/ Linear).
- Emit: dispatch on the returned plan; each emitter is
independently unit-testable.
Annotate the "Same-body arm merging" section to note it becomes
automatic under V2 — distinct arms with identical bodies naturally
collide on effect-payload equality.
V1 stays in place; V2 lands first in `compile_enhanced.ml` (where
`patternCode` lives), then backports to classical if the
architecture proves out.
Three refinements on top of the previous V2 reframing:
1. Handler ↔ Recognizer become distinct roles, not a 3-step pipeline:
- Handler sees the IR dispatch node + its type; knows which
strategies are meaningful for this decision shape.
- Recognizer lives inside the matching EDSL and sees the fully-
elaborated "test and branch" fragments; performs
`Match_decision { token_set; scrutinee_repr; type_info }`.
2. The protocol is explicitly generic — not variant-specific. The
handler interprets `token_set` (tag hashes, literal immediates,
nominal IDs, …); `scrutinee_repr` abstracts how to obtain the
discriminating value. Future applications flagged: AND-patterns
(where the handler can short-circuit components already matched
by an outer context) and literal-match chains.
3. V2 launch scope narrowed to Gosper-based MaskShift only. The
multi-strategy batched search stays listed as future work; the
protocol is forward-compatible.
Success criterion: or-pattern switches must compile to byte-identical
Wasm as their hand-expanded flat-arm equivalents (same mask, shift,
table, arm blocks modulo label numbering). FileCheck test pinning
this equivalence is the V2 acceptance gate.
…ing)
Earlier drafts left ambiguous whether the recognizer walks an
elaborated EDSL tree or fires effects during emission. The EDSL's
value type is opaque — `patternCode = CannotFail of G.t | CanFail of
(G.t -> G.t)` — so no walkable AST survives composition. Clarify:
- Recognizer fires `Match_decision` effects *during* the procedural
emitting-combinator calls (`fill_pat`, `compile_pat_local`, …),
not after.
- `(^^^)`, `orElse`, `orsPatternFailure` stay pure G.t manipulation.
Effects attach at leaf combinators that know what's being
discriminated (TagP, AltP-over-TagPs, later LitP).
- `body_compiler` is a thunk the handler chooses to invoke or not,
giving it *control over emission* rather than just strategy
selection. This is what makes future AND-patterns (where a
component may already be known from an outer context) a natural
extension: the handler returns No_op and suppresses emission.
List the concrete perform-sites for V2: `TagP`, `AltP` bottoming out
in `TagP` (the or-pattern fold), and the future `LitP` extension.
An or-pattern over tag constructors — e.g. `(#mon | #tue | ... | #fri) false` — was previously an opaque single arm from the V1 br_table guard's perspective, so `isWeekdayOr` did not get the `br_table` treatment that the structurally-identical `isWeekday` (7 flat arms) received. Extend the guard to recognise `AltP` chains bottoming out in `TagP` leaves and count the leaves (not the cases) toward the 4-arm threshold. Per arm, compile the body once using the first leg's sub-pattern (Motoko's or-pattern typing guarantees all legs bind the same variables). Every leaf of a case contributes one slot in the dispatch table pointing to the same arm block, so same-body arm merging now happens automatically for or-patterns — the emitted Wasm is strictly smaller than the hand-expanded flat equivalent while running the same number of instructions. Benchmark (`test/bench/variant-switch.mo`, `instr_isWeekendOr`): 11_050_321 → 7_070_321 (1.56×), matching `instr_isWeekend` exactly. Other bench rows also improved where or-patterns nest in the dispatch path. Mirrors the change in both `compile_classical.ml` and `compile_enhanced.ml`; introduces a shared `flatten_tag_leaves` helper next to `known_tag_pat`. Effect-based handler/recognizer split per `.claude/plans/variant-switch-br-table.md` lands as a follow-up refactor.
… (enhanced)
First concrete step of the V2 handler/recognizer refactor described in
.claude/plans/variant-switch-br-table.md. Introduce a `Dispatch` module
with a `Query : int64 list list -> plan` algebraic effect (OCaml 5.3,
same machinery as ConstTrack Phase 3). The default handler runs
Gosper-based mask-finding and returns a `MaskShift { mask; shift;
table_size; slot_for_case }` plan, or `Linear` as a fallback.
The recognizer at the SwitchE variant case now collects per-case leaf
hashes (one sub-list per case, with or-pattern legs contributing
multiple entries) and asks the handler for a plan. The guard checks
`MaskShift`; the body binds the plan's fields and emits the br_table
dispatch using `slot_for_case` directly instead of rebuilding it
inline.
Behaviour-preserving: compiled Wasm is byte-identical to the previous
commit for `isWeekday` and `isWeekendOr` (same mask 0x15000, shift 12,
23-slot table, same block labels). `variant_switch.mo` passes all
phases.
Why this shape is useful even before landing further refactors:
- Strategy logic (currently Gosper) is encapsulated in one place.
Adding ModPrime / RotLow / Linear heuristics becomes a change in
the handler, not in SwitchE.
- Outer scopes (tests, debug flags, size-budget passes) can install
their own handler to override the plan without touching the
recognizer.
- The protocol is token-agnostic — future LitP / AndP dispatch can
perform the same effect with their own token types.
Scope: enhanced backend only. `compile_classical.ml` keeps the inlined
extraction for now and can be retrofitted once the architecture proves
out (per user priority: tests live under enhanced).
Known duplication: the SwitchE guard currently runs `Dispatch.Query`
a second time in the body to pattern-match the plan's fields. Threading
one plan through guard and body is a follow-up — kept simple here to
minimise the diff and make the effect protocol the sole behavioural
change.
…anced) Previous commit queried `Dispatch.Query` twice — once in the `when` guard (to test `MaskShift`) and once in the body (to destructure the plan). The guard fired side-effect-free and the result was discarded. Collapse the guard + variant arm + default arm into a single `SwitchE (e, cs) ->` arm that computes `maybe_plan` once. `Some MaskShift` emits br_table; `Some Linear` or `None` (any case failing `flatten_tag_leaves`) falls through to the linear-chain emission inlined under the same arm. Scrutinee compilation (`code1`, `set_i`/`get_i` local) is hoisted out and shared between the two paths. Byte-identical Wasm to the previous commit for `isWeekday` and `isWeekendOr` (same dispatcher: mask 0x15000, shift 12, 23-slot table; same block labels). `variant_switch.mo` and `variants.mo` both pass. Net: -16 lines and only one `Dispatch.with_handler` invocation per SwitchE node.
Reference: weekday variant hashesRecovered from the compiled Wasm (
Gosper's mask picks the three bits at positions 12, 14, 16 ( Pattern reading: bit 16 of the hash == weekend-side (set for Fri/Sat/Sun, clear for Mon/Tue/Wed/Thu); bits 12/14 distinguish days within each half. Strategy notes for the dispatch handler:
|
| slot | case idx | tag | outcome |
|---|---|---|---|
| 0 | 1 | #Tue |
'T' |
| 1 | 3 | #Thu |
'T' |
| 4 | 0 | #Mon |
'M' |
| 5 | 2 | #Wed |
'W' |
| 16 | 4 | #Fri |
'F' |
| 20 | 6 | #Sun |
'S' |
| 21 | 5 | #Sat |
'S' |
The Tue and Thu arm blocks are byte-identical (both load i64.const 741070837121024 = 'T'); likewise Sat and Sun both load 732274744098816 = 'S'. Duplicated code, but each block is entered via exactly one br_table slot, so each call costs the same regardless of duplication.
startLetterOr (5 arms, 2 or-patterns) — 5 arm blocks
| slot | case idx | tag | outcome |
|---|---|---|---|
| 0 | 1 | #Tue |
'T' (case shared with #Thu) |
| 1 | 1 | #Thu |
'T' (same case) |
| 4 | 0 | #Mon |
'M' |
| 5 | 2 | #Wed |
'W' |
| 16 | 3 | #Fri |
'F' |
| 20 | 4 | #Sun |
'S' (case shared with #Sat) |
| 21 | 4 | #Sat |
'S' (same case) |
Same br_table, fewer arm blocks (two pairs collapsed).
Bench numbers (10 000 × 7-day iterations)
| metric | instructions |
|---|---|
instr_isWeekend |
7_070_321 (incl. outer if branch cost) |
instr_isWeekendOr |
7_070_321 |
instr_startLetter |
6_270_321 (clean dispatch + Char sink) |
instr_startLetterOr |
6_270_321 |
Key observation for strategy design: startLetter and startLetterOr cost exactly the same at runtime despite the or-pattern form having 2 fewer arm blocks. Same-body arm duplication is a pure code-size bloat — each slot's br_table entry jumps directly into its own block, which executes the same ≈3 Wasm instructions regardless of whether a byte-identical block lives next to it.
Where same-body merging would matter: upstream of the dispatcher. If the recognizer were to merge equivalence classes before handing the token set to the handler, N = 5 instead of N = 7. Then ModPrime could try mod 5 instead of mod 7 (half the table bytes), MaskShift would have fewer injectivity constraints (possibly a smaller mask), and perfect-hash search gets cheaper. That's the lever the plan now captures under Future Optimisation: Same-body arm merging — with the refinement that the equivalence criterion should be raw Wasm byte sequences (each arm is already a Block internally and (^^^) is difference-list concatenation of G.t), not IR structural equality.
For V2 as shipped here, or-pattern merging is the only channel; cross-case merging stays user-driven (write the or-pattern to get the size win).
…valence later Rework the "Future Optimisation: Same-body arm merging" section to record today's scope decision and the refinement direction. Key points captured: - V2 deliberately does NOT auto-merge arms with structurally-equal bodies across distinct cases. User-written or-patterns are the incentive channel — they communicate intent and are stable under refactoring. The recognizer's `flatten_tag_leaves` already collapses or-pattern legs; cross-case merging stays out of scope. - Why merging matters upstream: same-body merging is a code-size win (duplicated arm blocks saved) but NOT a speedup for the already-dispatched case — each br_table slot still lands in its own block executing the same instructions. The runtime payoff is in *strategy choice*: the handler's search space is parameterised by N = distinct outcome classes, so ModPrime uses a smaller prime, MaskShift has fewer injectivity constraints, perfect-hash search gets cheaper. - When cross-case merging eventually lands, the equivalence criterion should be raw Wasm byte sequences, not IR structural equality. Each arm is already a `Block` internally; `(^^^)` composition is difference-list concatenation of `G.t`; comparing compiled bytes skips IR phase-ordering noise and catches arms that incidentally lower to the same instructions. The `Dispatch.Query` protocol is already compatible (token_set is list-of-lists) — this is a recognizer-side extension, not a protocol change.
The existing isWeekend/isWeekendOr pair returns Bool, so an outer
`if (isWeekend d) acc1 += 1` branch muddies the per-switch cost.
Add a 7-way distinct-outcome pair that writes a Char sink directly,
giving a cleaner microbench of switch dispatch alone.
Bodies of startLetter include two natural same-body groups:
{Tue, Thu} → 'T' and {Sat, Sun} → 'S'. The -Or form collapses these
into or-patterns. Both compile to the same br_table dispatcher and
execute identical instruction counts — same-body arm blocks cost the
same regardless of whether they're physically one block (or-pattern)
or duplicated (flat). Useful datapoint when evaluating future Dispatch
strategies.
On the current branch (Gosper MaskShift, same-body merging via
or-patterns only):
instr_isWeekend = 7_070_321 (outer `if` adds ~800k)
instr_isWeekendOr = 7_070_321
instr_startLetter = 6_270_321 (cleaner: no outer branch)
instr_startLetterOr = 6_270_321
…tch_join) Replaces the single one-shot effect `Query : int64 list list -> plan` with a streaming pair: `Match_arm : int64 list -> unit` (submit one case's leaves) `Match_join : plan` (join all arms into a plan) The recognizer at SwitchE now iterates cases, performs `Match_arm hashes` per case, then `Match_join` to receive the plan. The handler accumulates arms in a mutable ref across the stream and commits the Gosper-based plan at join time. Behaviour-preserving: variant_switch.mo and variants.mo pass; the br_table for startLetter is byte-identical to the non-streaming version (same mask 0x15000, shift 12, 23-slot table, same labels). Why change the shape now, before any consumer needs it? The streaming protocol is how AND-patterns and literal-match chains will surface decisions — subcomponents fire `Match_arm` incrementally, only the outer context knows when to `Match_join`. Locking the protocol in now means those future recognizers slot in without a breaking-change to the effect type. The nested-switch case also cleanly works because each `with_handler` scope has its own accumulator ref; state doesn't leak between switches. The naming `Match_join` rather than `Match_close` reflects the semantics: the handler joins submitted arms into one dispatch decision, it's not merely closing a stream.
…stration
Adds the second concrete plan variant `ModPrime { p; case_for_residue }`
to the `Dispatch` protocol, emitted as `hash rem_u p; br_table` with
a p-slot table. The handler searches primes {2, 3, 5, 7, ..., 31}
smallest-first and accepts the first that partitions the input
cleanly (all leaves of the same case share a residue AND different
cases land on different residues).
The `choose_plan` policy is intentionally ad-hoc:
- if `n < 4` → Linear
- else if `c < n` → ModPrime (fall back to MaskShift if no prime works)
- else → MaskShift
where `n` = total leaves, `c` = number of cases. `c < n` ↔ at least
one case has an or-pattern.
Why the ad-hoc split matters: it gives the handler a *visible,
measurable* reason to branch on or-pattern structure, proving the
Dispatch protocol is wired up to select different strategies for
or-patterns vs flat expansions of the same switch. Bench numbers
after this patch:
instr_isWeekend = 7_070_321 (MaskShift: c=n=7)
instr_isWeekendOr = 7_560_321 (ModPrime: c=2, n=7)
instr_startLetter = 6_270_321 (MaskShift: c=n=7)
instr_startLetterOr = 6_760_321 (ModPrime: c=5, n=7)
ModPrime is *slower* than MaskShift per dispatch under the ICP cycle
model (`rem_u` > `and`+`shr_u`), so this policy presently makes
or-patterns run worse than their flat equivalents — the opposite of
what we ultimately want. That's by design for now: the point is to
show the protocol can differentiate. A follow-up commit will replace
the ad-hoc policy with something smarter (likely case-aware Gosper:
extend `find_variant_mask` to accept same-case hashes sharing a
slot, yielding smaller masks and a measurable or-pattern *win*).
The `Dispatch` protocol, emitters, and plan variant surface all stay
put across that refinement.
Summary
switchwith O(1)br_tabledispatch when there are 4 or more arms(hash_i & M) >> Sare all distinct, then emitsi32.and M; [i32.shr_u S]; br_tablemax(64, 4n)table size; falls back to linear chain if exceededsingle_case/simplify_cases)Benchmark (vs moc 1.3.0,
test/bench/variant-switch.mo)Baseline:
PATH=/nix/store/71w3w2df8xv4x56dkff6sl5yfwd01ccc-moc/bin:$PATHgo)evalfib(7) ×100transform×100The FT row is unchanged by design: the finally-tagless form has no
Exprvariant dispatch in its hot loop, confirming the other speedups are real and attributable to the switch optimisation.Test plan
make -C test/run variant_switch.only— passes (4-arm Color, 7-arm Weekday, 4-arm Shape with payloads)make -C test/run variants.only— existing variant tests pass (no regressions)br_tablewithi32.andemitted;i32.shr_uemitted when S > 0 (confirmed for Weekday: mask0x15000, shift 12)Key files
src/codegen/compile_classical.ml— helpersbits_needed,iter_masks_with_popcount(Gosper's hack),is_injective,compact_table_size,find_variant_mask; newSwitchEcasetest/run/variant_switch.mo— new test.claude/plans/variant-switch-br-table.md— design plan (includes future work: same-body arm merging, or-pattern handling)TODOs
(^^^)EDSL (maybe change to finally-tagless?)🤖 Generated with Claude Code