experiment: implement masked `br_table` dispatch for variant switches (n ≥ 4 arms) by ggreif · Pull Request #5927 · caffeinelabs/motoko

ggreif · 2026-03-21T02:13:29Z

Summary

Replaces the O(n) linear comparison chain for variant switch with O(1) br_table dispatch when there are 4 or more arms
At compile time, finds a bitmask M (and shift S = ctz(M)) such that (hash_i & M) >> S are all distinct, then emits i32.and M; [i32.shr_u S]; br_table
Mask-finding uses Gosper's hack to iterate candidates in order of popcount and value, ensuring compact (low-index) masks are tried first
Threshold: max(64, 4n) table size; falls back to linear chain if exceeded
Strict win for all n ≥ 4 cases (n = 1 and 2 already handled by single_case/simplify_cases)

Benchmark (vs moc 1.3.0, `test/bench/variant-switch.mo`)

Baseline: PATH=/nix/store/71w3w2df8xv4x56dkff6sl5yfwd01ccc-moc/bin:$PATH

Workload	moc 1.3	this branch	speedup
9-arm switch loop ×10 000 (`go`)	137,590,321	95,690,321	1.44×
AST `eval` fib(7) ×100	24,492,248	22,034,248	1.11×
AST→FT `transform` ×100	1,189,148	1,057,948	1.12×
FT eval fib(7) ×100	21,519,148	21,519,148	1.00× (noise floor — no variant dispatch)

The FT row is unchanged by design: the finally-tagless form has no Expr variant dispatch in its hot loop, confirming the other speedups are real and attributable to the switch optimisation.

Test plan

make -C test/run variant_switch.only — passes (4-arm Color, 7-arm Weekday, 4-arm Shape with payloads)
make -C test/run variants.only — existing variant tests pass (no regressions)
Inspect WAT output: br_table with i32.and emitted; i32.shr_u emitted when S > 0 (confirmed for Weekday: mask 0x15000, shift 12)

Key files

src/codegen/compile_classical.ml — helpers bits_needed, iter_masks_with_popcount (Gosper's hack), is_injective, compact_table_size, find_variant_mask; new SwitchE case
test/run/variant_switch.mo — new test
.claude/plans/variant-switch-br-table.md — design plan (includes future work: same-body arm merging, or-pattern handling)

TODOs

effective branches (provably same result)
distill not from IR, but from the (^^^) EDSL (maybe change to finally-tagless?)

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

At compile time, find a bitmask M (and shift S = ctz(M)) such that (hash_i & M) >> S are all distinct for the n variant tags. Then emit: local.get $tag_field i32.const M ;; compile-time constant i32.and i32.const S ;; omitted when S = 0 i32.shr_u ;; omitted when S = 0 br_table ... ;; O(1) dispatch compared to the previous O(n) linear comparison chain. The break-even is at n = 3 (worst case) / n = 3 (average), but n = 1 and n = 2 are already handled by single_case / simplify_cases, so the new path is a strict win for every applicable case (n ≥ 4 with all TagP arms). Mask-finding uses Gosper's hack to iterate candidate masks in order of increasing value, ensuring compact (low-index) masks are tried first and table sizes remain small. Threshold: max(64, 4n). Also: add test/run/variant_switch.mo covering 4-arm, 7-arm and payload-carrying variant switches; add "same-body arm merging" to the plan as a future optimisation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…d.ml) Same optimisation as the classical backend: n ≥ 4 all-TagP variant switches get masked br_table dispatch instead of a linear comparison chain. The EOP backend uses int64 hashes throughout, so the helpers use Int64 arithmetic. All bench tests now pass (instruction counts updated to reflect the savings). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

br_table always expects an i32 operand; the EOP backend operates on i64 values, so add an i64.to_i32 (WrapI64) conversion after the masked/shifted tag before the br_table instruction. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…imisation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Benchmarks the masked br_table dispatch by traversing a synthetic ~700-node expression tree (constructors: Var, Lit, App, Lam, Let, LetRec, Case, Con) 10_000 times and reporting instruction counts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Two bugs in compile_enhanced.ml's masked br_table dispatch for EOP: 1. iter_masks_with_popcount iterated all 64-bit k-bit patterns. EOP variant hashes are extend_i32_u values (bits 0-31 only), so masks with bits ≥ 32 are useless. For k≥4 this blew up: C(64,4)=635k vs C(32,4)=36k, causing the compiler to hang. Fix: cap the loop at k > 32 / mask ≥ 2^32, matching the classical int32 backend. 2. Nat64.of_int64 mask crashed when mask was negative (bit 63 set). Gosper's hack can produce such masks before the new early-exit terminates. Fix: replace with a local ctz64 that works for any non-zero int64 by isolating the lowest set bit via a shift loop. Also adds test/bench/variant-switch.mo: GHC-Core-like 8-arm expression interpreter bench (Var/Lit/App/Lam/Let/LetRec/Case/Con) exercising the hot-path switch dispatch at 10k iterations over a 24-node tree. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e s) Cases 0 and 1 previously returned leaf nodes (#Var "x", #Lit d) ignoring the sub-tree s, causing the tree to reset every 8 levels and stay at ≤24 nodes regardless of depth. Replace with #App(#Var "x", s) and #Lam("k", s) so every level wraps s; #Case at d%8=6 doubles s, giving exponential growth. build 15 now produces 80 nodes (800k total/10k iterations, ~70M instructions) vs the old 24-node cycle. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add 2^10 iteration cutoff to iter_masks_with_popcount in both backends to prevent O(C(32,k)) compile-time blowup on switches where no injective mask exists (e.g. nested-pattern switches with duplicate outer labels) - Add distinct-labels guard to the SwitchE br_table branch: only fire when all outer TagP labels are unique, as the known_tag_pat arm codes (outer tag check stripped) are only correct for flat variant dispatch - Document both issues and the deeper fix (None fallback should fall through to regular handler) in the plan Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…r dispatch) Move find_variant_mask into the `when` guard in both backends so that a None result (cutoff reached, no valid mask, or duplicate labels) causes the guard to fail and OCaml falls through to the regular SwitchE handler with full patterns. This eliminates: - The broken None branch that used known_tag_pat arms (no outer tag check), which caused incorrect dispatch (e.g. debug_show on 12-arm Action_ type routed #RegisterKnownNeuron to #AddOrRemoveNodeProvider) - The distinct-labels workaround (now fully subsumed: duplicate labels make is_injective fail for every mask, so find_variant_mask returns None) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-21T09:56:10Z

Comparing from bc32e40 to 704169d:
In terms of gas, 1 tests improved and the mean change is -0.0%.
In terms of size, 1 tests improved and the mean change is -0.0%.

…-31 masks - Raise Gosper iteration cutoff from 2^10 to 2^16 in both backends, enabling larger variant types (e.g. 12-arm NNS Action_ type) to find a compact mask where the 2^10 limit would time out - Change classical backend loop guard from `!m <> 0l` to `!m > 0l`: stops at zero (wrapped past 2^32) AND at negative int32 (bit 31 set). Motoko hashes are 31-bit (Mo_types.Hash.hash always clears bit 31), so masks with bit 31 set are irrelevant and Nat32.of_int32 would crash on them with Invalid_argument("value out of bounds") - EOP backend already caps at 0x1_0000_0000 (32-bit range); 31-bit cap is implicitly safe there since hashes fit in 31 bits - Document resolution of plan item 3 (31-bit vs 32-bit hashes) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Document a future optimisation replacing the single Gosper stream with concurrent strategy generators (MaskShift batched by bit-window, ModPrime, RotLow) merged round-robin and ranked by cycle-cost estimate. This avoids any single strategy's worst case dominating compile time, and subsumes the Pre-shortening section. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds a 9th variant `#Prim : Char` for primitive operations (needed for upcoming `fib` benchmark). Wrapped as `#App (#Prim '+', s)` in `build` so the recursive sub-tree is preserved and node count stays ~82. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Encodes fib over Peano naturals using #LetRec/#Lam/#Case/#Con/#Prim. Currently unused (_fibCore); will serve as the benchmark program for an upcoming eval function. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Avoids computing #App (#Prim '-', #Var "n") twice in the recursive arm. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Also removes _ prefix from fibCore now that it is used. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…both Adds: - Val/Env runtime types (Peano naturals via #VCon) - Direct AST interpreter: eval : (Expr, Env) -> Val - Finally-tagless machinery: FT = Env -> Val, Symantics record, transform - evalSem: the evaluating Symantics (record of closures, no variant dispatch) - evalBench: runs fib(7) 100x via eval and via FT, reports instruction counts Result: fib(7)=13 correct for both; FT ~5% fewer instructions than direct eval. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Benchmarks the `transform(evalSem, fibCore)` step (100 iterations) and verifies correctness via `fib7_xform`. Updates `.ok` with new instruction counts including `instr_transform`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

## Summary Adds `test/bench/variant-switch.mo` — a small GHC-Core-like interpreter that exercises 9-arm variant-switch dispatch in several shapes. Serves as a reference point (baseline on `master`) for the dispatch-optimisation work on #5927 (masked `br_table`) and any follow-ups. Actor methods: - `go` — top-level `size tree + size fibCore`, ×10k. - `evalBench` — `fib(7)` via direct AST eval, compiled finally-tagless form, and the AST→FT transform itself; ×100 each. - `weekdayBench` — `isWeekend` (7 explicit arms) vs `isWeekendOr` (same dispatch via `or`-patterns); ×10k over a 7-arm `Weekday` variant. - `getPerfData` — reports `rts_lifetime_instructions`. ## Master baseline | Metric | Instructions | |---|---| | `size tree + fibCore` (×10k) | 137,590,321 | | `eval fib(7)` AST (×100) | 24,509,348 | | `eval fib(7)` FT (×100) | 21,536,248 | | AST→FT transform (×100) | 1,189,148 | | `isWeekend` (×10k × 7 arms) | 10,010,321 | | `isWeekendOr` (×10k × 7 arms) | 11,050,321 | With #5927 applied, the explicit-arm dispatch numbers drop materially; or-patterns currently don't benefit (separate follow-up). ## Test plan - [x] `make -C test/bench variant-switch.only` passes on `master`. - [x] drun output captured in `test/bench/ok/variant-switch.drun-run.ok`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirrors isWeekend exactly but collapses the 5 weekday cases into one or-pattern arm and the 2 weekend cases into another. Exercises the same-body arm-merging path noted in 9d7c498. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

V1 (IR-level, `SwitchE` with `TagP` arms) is blind to or-patterns: the semantically-identical `isWeekday` (7 flat arms) and `isWeekdayOr` (2-arm or-pattern) end up with different IR shapes, so only the former hits the `br_table` dispatch path. Revise "Where to Apply" to introduce Option C: recognise the tag-hash-compare fragments at the `patternCode` EDSL level, where both shapes have already elaborated to the same `(^^^)` chain. Architecture is three handlers using OCaml 5.3 algebraic effects (same machinery as ConstTrack Phase 3): - Recognizer: perform `Variant_arm {hash; body_code}` at each tag-compare site. - Strategy query: at `compile_switch` entry, collect the effect trace and perform `Dispatch_strategy tag_set` to let the enclosing handler choose a plan (MaskShift / ModPrime / RotLow / Linear). - Emit: dispatch on the returned plan; each emitter is independently unit-testable. Annotate the "Same-body arm merging" section to note it becomes automatic under V2 — distinct arms with identical bodies naturally collide on effect-payload equality. V1 stays in place; V2 lands first in `compile_enhanced.ml` (where `patternCode` lives), then backports to classical if the architecture proves out.

Three refinements on top of the previous V2 reframing: 1. Handler ↔ Recognizer become distinct roles, not a 3-step pipeline: - Handler sees the IR dispatch node + its type; knows which strategies are meaningful for this decision shape. - Recognizer lives inside the matching EDSL and sees the fully- elaborated "test and branch" fragments; performs `Match_decision { token_set; scrutinee_repr; type_info }`. 2. The protocol is explicitly generic — not variant-specific. The handler interprets `token_set` (tag hashes, literal immediates, nominal IDs, …); `scrutinee_repr` abstracts how to obtain the discriminating value. Future applications flagged: AND-patterns (where the handler can short-circuit components already matched by an outer context) and literal-match chains. 3. V2 launch scope narrowed to Gosper-based MaskShift only. The multi-strategy batched search stays listed as future work; the protocol is forward-compatible. Success criterion: or-pattern switches must compile to byte-identical Wasm as their hand-expanded flat-arm equivalents (same mask, shift, table, arm blocks modulo label numbering). FileCheck test pinning this equivalence is the V2 acceptance gate.

…ing) Earlier drafts left ambiguous whether the recognizer walks an elaborated EDSL tree or fires effects during emission. The EDSL's value type is opaque — `patternCode = CannotFail of G.t | CanFail of (G.t -> G.t)` — so no walkable AST survives composition. Clarify: - Recognizer fires `Match_decision` effects *during* the procedural emitting-combinator calls (`fill_pat`, `compile_pat_local`, …), not after. - `(^^^)`, `orElse`, `orsPatternFailure` stay pure G.t manipulation. Effects attach at leaf combinators that know what's being discriminated (TagP, AltP-over-TagPs, later LitP). - `body_compiler` is a thunk the handler chooses to invoke or not, giving it *control over emission* rather than just strategy selection. This is what makes future AND-patterns (where a component may already be known from an outer context) a natural extension: the handler returns No_op and suppresses emission. List the concrete perform-sites for V2: `TagP`, `AltP` bottoming out in `TagP` (the or-pattern fold), and the future `LitP` extension.

An or-pattern over tag constructors — e.g. `(#mon | #tue | ... | #fri) false` — was previously an opaque single arm from the V1 br_table guard's perspective, so `isWeekdayOr` did not get the `br_table` treatment that the structurally-identical `isWeekday` (7 flat arms) received. Extend the guard to recognise `AltP` chains bottoming out in `TagP` leaves and count the leaves (not the cases) toward the 4-arm threshold. Per arm, compile the body once using the first leg's sub-pattern (Motoko's or-pattern typing guarantees all legs bind the same variables). Every leaf of a case contributes one slot in the dispatch table pointing to the same arm block, so same-body arm merging now happens automatically for or-patterns — the emitted Wasm is strictly smaller than the hand-expanded flat equivalent while running the same number of instructions. Benchmark (`test/bench/variant-switch.mo`, `instr_isWeekendOr`): 11_050_321 → 7_070_321 (1.56×), matching `instr_isWeekend` exactly. Other bench rows also improved where or-patterns nest in the dispatch path. Mirrors the change in both `compile_classical.ml` and `compile_enhanced.ml`; introduces a shared `flatten_tag_leaves` helper next to `known_tag_pat`. Effect-based handler/recognizer split per `.claude/plans/variant-switch-br-table.md` lands as a follow-up refactor.

… (enhanced) First concrete step of the V2 handler/recognizer refactor described in .claude/plans/variant-switch-br-table.md. Introduce a `Dispatch` module with a `Query : int64 list list -> plan` algebraic effect (OCaml 5.3, same machinery as ConstTrack Phase 3). The default handler runs Gosper-based mask-finding and returns a `MaskShift { mask; shift; table_size; slot_for_case }` plan, or `Linear` as a fallback. The recognizer at the SwitchE variant case now collects per-case leaf hashes (one sub-list per case, with or-pattern legs contributing multiple entries) and asks the handler for a plan. The guard checks `MaskShift`; the body binds the plan's fields and emits the br_table dispatch using `slot_for_case` directly instead of rebuilding it inline. Behaviour-preserving: compiled Wasm is byte-identical to the previous commit for `isWeekday` and `isWeekendOr` (same mask 0x15000, shift 12, 23-slot table, same block labels). `variant_switch.mo` passes all phases. Why this shape is useful even before landing further refactors: - Strategy logic (currently Gosper) is encapsulated in one place. Adding ModPrime / RotLow / Linear heuristics becomes a change in the handler, not in SwitchE. - Outer scopes (tests, debug flags, size-budget passes) can install their own handler to override the plan without touching the recognizer. - The protocol is token-agnostic — future LitP / AndP dispatch can perform the same effect with their own token types. Scope: enhanced backend only. `compile_classical.ml` keeps the inlined extraction for now and can be retrofitted once the architecture proves out (per user priority: tests live under enhanced). Known duplication: the SwitchE guard currently runs `Dispatch.Query` a second time in the body to pattern-match the plan's fields. Threading one plan through guard and body is a follow-up — kept simple here to minimise the diff and make the effect protocol the sole behavioural change.

…anced) Previous commit queried `Dispatch.Query` twice — once in the `when` guard (to test `MaskShift`) and once in the body (to destructure the plan). The guard fired side-effect-free and the result was discarded. Collapse the guard + variant arm + default arm into a single `SwitchE (e, cs) ->` arm that computes `maybe_plan` once. `Some MaskShift` emits br_table; `Some Linear` or `None` (any case failing `flatten_tag_leaves`) falls through to the linear-chain emission inlined under the same arm. Scrutinee compilation (`code1`, `set_i`/`get_i` local) is hoisted out and shared between the two paths. Byte-identical Wasm to the previous commit for `isWeekday` and `isWeekendOr` (same dispatcher: mask 0x15000, shift 12, 23-slot table; same block labels). `variant_switch.mo` and `variants.mo` both pass. Net: -16 lines and only one `Dispatch.with_handler` invocation per SwitchE node.

ggreif · 2026-04-22T22:50:44Z

Reference: weekday variant hashes

Recovered from the compiled Wasm (i64.const at each #Day construction site) and cross-checked against the br_table slots in isWeekend/isWeekendOr. Useful as a concrete worked example when developing alternative Dispatch strategies.

tag	hash (dec)	hash (hex)	`& 0x15000`	br_table slot
`#Mon`	3_853_996	`0x3ACEAC`	`0x04000`	4
`#Tue`	4_203_428	`0x4023A4`	`0x00000`	0
`#Wed`	4_349_046	`0x425C76`	`0x05000`	5
`#Thu`	4_200_545	`0x401861`	`0x01000`	1
`#Fri`	3_506_557	`0x35817D`	`0x10000`	16
`#Sat`	4_149_254	`0x3F5006`	`0x15000`	21
`#Sun`	4_153_708	`0x3F616C`	`0x14000`	20

Gosper's mask picks the three bits at positions 12, 14, 16 (mask = 0x15000, shift = 12). Seven weekday tags land on seven distinct slot values {0, 1, 4, 5, 16, 20, 21}; slot 17 is the one valid-but-unused slot visible in the br_table's default-label entries — any future weekday whose hash happened to set bits 0 and 4 (post-shift) would land there without growing the table.

Pattern reading: bit 16 of the hash == weekend-side (set for Fri/Sat/Sun, clear for Mon/Tue/Wed/Thu); bits 12/14 distinguish days within each half.

Strategy notes for the dispatch handler:

MaskShift (current default, Gosper): 23-slot table, 2 arithmetic ops (AND + SHR_U) + br_table.
ModPrime could try small primes p ≥ 7; e.g. does hᵢ mod 11 give 7 distinct residues? (Quick check: 3853996 % 11 = 2, 4203428 % 11 = 9, 4349046 % 11 = 10, 4200545 % 11 = 6, 3506557 % 11 = 3, 4149254 % 11 = 7, 4153708 % 11 = 6 — collision Thu↔Sun on mod 11; try 13, 17, …). Cheaper table (size p) but rem_u is ~3× an and.
RotLow bits=3 rot=? — would need table size 8; worth trying for comparison once the handler protocol supports it.

`startLetter`: 7-way distinct dispatch (flat vs or-pattern)

A richer 7-way switch added to test/bench/variant-switch.mo. Two forms produce the same mask=0x15000, shift=12 dispatcher (same 7 weekday hashes), but partition the arm blocks differently:

`startLetter` (7 flat arms) — 8 arm blocks

slot	case idx	tag	outcome
0	1	`#Tue`	`'T'`
1	3	`#Thu`	`'T'`
4	0	`#Mon`	`'M'`
5	2	`#Wed`	`'W'`
16	4	`#Fri`	`'F'`
20	6	`#Sun`	`'S'`
21	5	`#Sat`	`'S'`

The Tue and Thu arm blocks are byte-identical (both load i64.const 741070837121024 = 'T'); likewise Sat and Sun both load 732274744098816 = 'S'. Duplicated code, but each block is entered via exactly one br_table slot, so each call costs the same regardless of duplication.

`startLetterOr` (5 arms, 2 or-patterns) — 5 arm blocks

slot	case idx	tag	outcome
0	1	`#Tue`	`'T'` (case shared with `#Thu`)
1	1	`#Thu`	`'T'` (same case)
4	0	`#Mon`	`'M'`
5	2	`#Wed`	`'W'`
16	3	`#Fri`	`'F'`
20	4	`#Sun`	`'S'` (case shared with `#Sat`)
21	4	`#Sat`	`'S'` (same case)

Same br_table, fewer arm blocks (two pairs collapsed).

Bench numbers (10 000 × 7-day iterations)

metric	instructions
`instr_isWeekend`	7_070_321 (incl. outer `if` branch cost)
`instr_isWeekendOr`	7_070_321
`instr_startLetter`	6_270_321 (clean dispatch + Char sink)
`instr_startLetterOr`	6_270_321

Key observation for strategy design: startLetter and startLetterOr cost exactly the same at runtime despite the or-pattern form having 2 fewer arm blocks. Same-body arm duplication is a pure code-size bloat — each slot's br_table entry jumps directly into its own block, which executes the same ≈3 Wasm instructions regardless of whether a byte-identical block lives next to it.

Where same-body merging would matter: upstream of the dispatcher. If the recognizer were to merge equivalence classes before handing the token set to the handler, N = 5 instead of N = 7. Then ModPrime could try mod 5 instead of mod 7 (half the table bytes), MaskShift would have fewer injectivity constraints (possibly a smaller mask), and perfect-hash search gets cheaper. That's the lever the plan now captures under Future Optimisation: Same-body arm merging — with the refinement that the equivalence criterion should be raw Wasm byte sequences (each arm is already a Block internally and (^^^) is difference-list concatenation of G.t), not IR structural equality.

For V2 as shipped here, or-pattern merging is the only channel; cross-case merging stays user-driven (write the or-pattern to get the size win).

…valence later Rework the "Future Optimisation: Same-body arm merging" section to record today's scope decision and the refinement direction. Key points captured: - V2 deliberately does NOT auto-merge arms with structurally-equal bodies across distinct cases. User-written or-patterns are the incentive channel — they communicate intent and are stable under refactoring. The recognizer's `flatten_tag_leaves` already collapses or-pattern legs; cross-case merging stays out of scope. - Why merging matters upstream: same-body merging is a code-size win (duplicated arm blocks saved) but NOT a speedup for the already-dispatched case — each br_table slot still lands in its own block executing the same instructions. The runtime payoff is in *strategy choice*: the handler's search space is parameterised by N = distinct outcome classes, so ModPrime uses a smaller prime, MaskShift has fewer injectivity constraints, perfect-hash search gets cheaper. - When cross-case merging eventually lands, the equivalence criterion should be raw Wasm byte sequences, not IR structural equality. Each arm is already a `Block` internally; `(^^^)` composition is difference-list concatenation of `G.t`; comparing compiled bytes skips IR phase-ordering noise and catches arms that incidentally lower to the same instructions. The `Dispatch.Query` protocol is already compatible (token_set is list-of-lists) — this is a recognizer-side extension, not a protocol change.

The existing isWeekend/isWeekendOr pair returns Bool, so an outer `if (isWeekend d) acc1 += 1` branch muddies the per-switch cost. Add a 7-way distinct-outcome pair that writes a Char sink directly, giving a cleaner microbench of switch dispatch alone. Bodies of startLetter include two natural same-body groups: {Tue, Thu} → 'T' and {Sat, Sun} → 'S'. The -Or form collapses these into or-patterns. Both compile to the same br_table dispatcher and execute identical instruction counts — same-body arm blocks cost the same regardless of whether they're physically one block (or-pattern) or duplicated (flat). Useful datapoint when evaluating future Dispatch strategies. On the current branch (Gosper MaskShift, same-body merging via or-patterns only): instr_isWeekend = 7_070_321 (outer `if` adds ~800k) instr_isWeekendOr = 7_070_321 instr_startLetter = 6_270_321 (cleaner: no outer branch) instr_startLetterOr = 6_270_321

…tch_join) Replaces the single one-shot effect `Query : int64 list list -> plan` with a streaming pair: `Match_arm : int64 list -> unit` (submit one case's leaves) `Match_join : plan` (join all arms into a plan) The recognizer at SwitchE now iterates cases, performs `Match_arm hashes` per case, then `Match_join` to receive the plan. The handler accumulates arms in a mutable ref across the stream and commits the Gosper-based plan at join time. Behaviour-preserving: variant_switch.mo and variants.mo pass; the br_table for startLetter is byte-identical to the non-streaming version (same mask 0x15000, shift 12, 23-slot table, same labels). Why change the shape now, before any consumer needs it? The streaming protocol is how AND-patterns and literal-match chains will surface decisions — subcomponents fire `Match_arm` incrementally, only the outer context knows when to `Match_join`. Locking the protocol in now means those future recognizers slot in without a breaking-change to the effect type. The nested-switch case also cleanly works because each `with_handler` scope has its own accumulator ref; state doesn't leak between switches. The naming `Match_join` rather than `Match_close` reflects the semantics: the handler joins submitted arms into one dispatch decision, it's not merely closing a stream.

…stration Adds the second concrete plan variant `ModPrime { p; case_for_residue }` to the `Dispatch` protocol, emitted as `hash rem_u p; br_table` with a p-slot table. The handler searches primes {2, 3, 5, 7, ..., 31} smallest-first and accepts the first that partitions the input cleanly (all leaves of the same case share a residue AND different cases land on different residues). The `choose_plan` policy is intentionally ad-hoc: - if `n < 4` → Linear - else if `c < n` → ModPrime (fall back to MaskShift if no prime works) - else → MaskShift where `n` = total leaves, `c` = number of cases. `c < n` ↔ at least one case has an or-pattern. Why the ad-hoc split matters: it gives the handler a *visible, measurable* reason to branch on or-pattern structure, proving the Dispatch protocol is wired up to select different strategies for or-patterns vs flat expansions of the same switch. Bench numbers after this patch: instr_isWeekend = 7_070_321 (MaskShift: c=n=7) instr_isWeekendOr = 7_560_321 (ModPrime: c=2, n=7) instr_startLetter = 6_270_321 (MaskShift: c=n=7) instr_startLetterOr = 6_760_321 (ModPrime: c=5, n=7) ModPrime is *slower* than MaskShift per dispatch under the ICP cycle model (`rem_u` > `and`+`shr_u`), so this policy presently makes or-patterns run worse than their flat equivalents — the opposite of what we ultimately want. That's by design for now: the point is to show the protocol can differentiate. A follow-up commit will replace the ad-hoc policy with something smarter (likely case-aware Gosper: extend `find_variant_mask` to accept same-case hashes sharing a slot, yielding smaller masks and a measurable or-pattern *win*). The `Dispatch` protocol, emitters, and plan variant surface all stay put across that refinement.

ggreif and others added 4 commits March 21, 2026 02:23

plan: masked br_table dispatch for variant switches

726c388

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

plan: add IR-vs-peephole analysis section

94ae70f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Plan: note that same-body arm merging covers or-patterns

9d7c498

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ggreif requested a review from a team as a code owner March 21, 2026 02:13

Plan: mark Step 4 (compile_enhanced.ml / EOP backend) as TODO

75828f5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ggreif changed the title ~~Implement masked br_table dispatch for variant switches (n ≥ 4 arms)~~ Implement masked br_table dispatch for variant switches (n ≥ 4 arms) Mar 21, 2026

ggreif and others added 8 commits March 21, 2026 03:27

Plan: add pre-shortening (mod-prime / low-bit rotation) as future opt…

91802d6

…imisation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ggreif changed the title ~~Implement masked br_table dispatch for variant switches (n ≥ 4 arms)~~ experiment: implement masked br_table dispatch for variant switches (n ≥ 4 arms) Mar 21, 2026

Updating test/bench numbers

c1ba299

ggreif and others added 6 commits March 21, 2026 11:07

bench/variant-switch: CSE — Let-bind pred(n) as n1 in fibCore

7576619

Avoids computing #App (#Prim '-', #Var "n") twice in the recursive arm. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Updating test/bench numbers

d21aae1

ggreif self-assigned this Mar 23, 2026

ggreif marked this pull request as draft March 24, 2026 09:20

ggreif and others added 3 commits March 24, 2026 10:36

bench/variant-switch: count fibCore nodes in go()

2b9771b

Also removes _ prefix from fibCore now that it is used. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ggreif added the performance Affects only gas usage or code size label Apr 1, 2026

Merge branch 'master' into gabor/variant-switch

c6f3a2c

ggreif mentioned this pull request Apr 19, 2026

bench: add variant-switch.mo (baseline) #6033

Merged

2 tasks

Updating test/bench numbers

8df6375

ggreif and others added 10 commits April 22, 2026 23:19

Merge branch 'master' into gabor/variant-switch

e1ef4da

Updating test/bench numbers

2ac922a

Updating test/bench numbers

451758f

ggreif and others added 6 commits April 23, 2026 01:16

Updating test/bench numbers

5a36fe7

Updating test/bench numbers

704169d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiment: implement masked `br_table` dispatch for variant switches (n ≥ 4 arms)#5927

experiment: implement masked `br_table` dispatch for variant switches (n ≥ 4 arms)#5927
ggreif wants to merge 41 commits intomasterfrom
gabor/variant-switch

ggreif commented Mar 21, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 21, 2026 •

edited

Loading

Uh oh!

ggreif commented Apr 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ggreif commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark (vs moc 1.3.0, test/bench/variant-switch.mo)

Test plan

Key files

TODOs

Uh oh!

github-actions Bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggreif commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference: weekday variant hashes

startLetter: 7-way distinct dispatch (flat vs or-pattern)

startLetter (7 flat arms) — 8 arm blocks

startLetterOr (5 arms, 2 or-patterns) — 5 arm blocks

Bench numbers (10 000 × 7-day iterations)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ggreif commented Mar 21, 2026 •

edited

Loading

Benchmark (vs moc 1.3.0, `test/bench/variant-switch.mo`)

github-actions Bot commented Mar 21, 2026 •

edited

Loading

ggreif commented Apr 22, 2026 •

edited

Loading

`startLetter`: 7-way distinct dispatch (flat vs or-pattern)

`startLetter` (7 flat arms) — 8 arm blocks

`startLetterOr` (5 arms, 2 or-patterns) — 5 arm blocks