feat: support fixed-width Nat types as array and blob indices by q-uint · Pull Request #5929 · caffeinelabs/motoko

q-uint · 2026-03-22T16:11:02Z

Summary

Allow Nat8, Nat16, Nat32, and Nat64 values as array indices (and Nat64 for blobs), bypassing bignum conversion for direct word-sized index operations
All fixed-width Nat types compile to direct indexing, reducing IC instruction counts by 30-59% compared to Nat indices
Changes span the type checker, both codegen backends, IR checker, both interpreters, and include run/trap/fail/bench tests

IC instruction counts (1000 iters x 256 elements)

Type	Instructions	vs Nat
Nat (baseline)	25,687,306
Nat8	11,036,370	-57%
Nat16	11,079,329	-57%
Nat32	10,567,324	-59%
Nat64	17,991,325	-30%
Nat64+toNat()	20,807,325	-19%

Allow Nat64 values as array indices, bypassing the bignum-to-word64 conversion that Nat indices require. On the IC, this reduces array access instructions by ~30% when the index is already a Nat64. Changes span the type checker (accept Nat64 in index position), both codegen backends (dispatch to direct word64 indexing), the IR checker, and both interpreters.

The IdxE handler in interpret.ml dispatched on Nat64 before matching on Blob vs array, meaning blob[nat64_val] would have succeeded if it reached the interpreter. The type checker already rejects this, so it was unreachable, but the contract was looser than necessary. Move the Nat64 dispatch inside the array-only branch.

Extend indexing to accept Nat8, Nat16, Nat32 in addition to Nat64. All fixed-width Nat types compile to direct word-sized index operations, bypassing bignum conversion entirely. IC instruction counts (1000 iters x 256 elements): Nat (baseline): 25,687,306 Nat8: 11,036,370 (-57%) Nat16: 11,079,329 (-57%) Nat32: 10,567,324 (-59%) Nat64: 17,991,325 (-30%) Nat64+toNat(): 20,807,325 (-19%) Also extends Nat64 indexing to blobs, scopes interpreter Nat64 dispatch to the correct branches, and uses T.sub uniformly in place of the T.sub/T.eq asymmetry.

q-uint · 2026-03-23T07:42:29Z

CI is failing because cachix watch-exec requires CACHIX_AUTH_TOKEN, which isn't available to workflows triggered from fork PRs? The build/test themselves should be fine, it's just the Cachix authentication step that errors out.

ggreif · 2026-03-23T11:48:44Z

Just an idea. Normally a small Nat will be used for indexing, and checking for that should be trivial. I.e. masking the highest and lowest bits, and if both zero shift the number by 1/2 to the left. This is the offset from the array's start. Maybe we should run the benchmarks agains this speculative (low-hanging fruit) attempt.

q-uint · 2026-03-23T12:24:20Z

Tried this out: mask bit 0 (scalar tag) and bit 63 (sign), and if both zero, shift right to recover the raw index and go straight to Arr.idx, bypassing the idx_bigint -> BigNum.to_word64_with call chain entirely. The catch is that BigNum.to_word64_with (in MakeCompact) already has an inline tagged-scalar fast path: it checks bit 0, branches, and unboxes with a shift. So the speculative check is doing essentially the same work, just one function call closer to the indexing site.

Benchmark results (256K accesses, enhanced backend):

Index type	Before	After
Nat32	19,538,348	19,538,348
Nat64	23,634,342	23,634,342
Nat	36,968,319	36,712,319 (-0.7%)
Nat64->toNat	29,522,342	29,266,342 (-0.9%)

But maybe I interpreted the suggestion wrong?

4299689

When indexing with plain Nat, the compiler always called idx_bigint, paying for a function call and error message local setup even when the index is a small tagged scalar (the common case). Inline the scalar check at the four indexing sites (array and blob, classical and enhanced): mask the tag and sign bits, and if both zero, shift to recover the raw index and call idx directly, bypassing idx_bigint entirely. Heap bigints fall through to the existing path. Benchmark (enhanced, 256K accesses): Nat index -0.7% instructions.

## Problem Fork PRs don't receive repository secrets. When `CACHIX_AUTH_TOKEN` is empty, `cachix watch-exec` fails with a `NoSigningKey` error before tests even start, blocking all CI for external contributors. ## Solution Make the `cachix watch-exec` wrapper conditional in the test blueprint action. When the auth token is absent (fork PR), run `nix-build-uncached` directly without the push wrapper. - The `cachix-action` step already handles an empty token gracefully — it configures read-only substituter access via `cachix use`, so cached derivations are still fetched. - Only the push of newly-built derivations is lost, which is acceptable for fork PRs. - The token is passed via an `env` variable (not inline) to avoid leaking it in logs. - No caller changes needed — `${{ secrets.CACHIX_AUTH_TOKEN }}` naturally resolves to empty for fork PRs. Motivated by PR #5929. Made with [Cursor](https://cursor.com)

Kamirus · 2026-03-23T12:55:22Z

CI is failing because cachix watch-exec requires CACHIX_AUTH_TOKEN, which isn't available to workflows triggered from fork PRs? The build/test themselves should be fine, it's just the Cachix authentication step that errors out.

Now the CI works with forks as well

Nat64 index type checking now infers index sub-expressions, which correctly surfaces M0155 (operator may trap) for Nat subtraction in array indices.

crusso · 2026-03-24T13:11:22Z

Hi Quint,

Thanks for the PR. It looks like you are essentially overloading array indexing on all unsigned types.

However, this seems to fly against the Motoko KISS principle (although maybe that ship sailed long ago), especially not allowing implicit conversions between types.

What is the motivation? If it is just perf, not programmer convenience, I think we could easily enough optimize any
a.[NatXX.toNat(n)] indexing (in compile_array_index) without all these changes to typing etc.

Would that be enough for you or do you actually want to avoid the user writing those casts?

I'm a little worried that flipping from checking against Nat mode to inferring the type of the index might actually break a fair bit of code.

If we really want to support this, maybe just adding explict a.sub8() etc prims would be enough for people that want to micro-optimize their code.

BTW, I heard a rumour that you want to extend subtyping and allow NatXX<: Nat but that's a non-starter because Motoko's subtyping is by subsumption, not coercion. That is subtyping always compiles to a no-op, not a cast that transforms the value to a new one. To support NatXX <: Nat we would need to ensure the representation of these values is always the same or modify every operation on Nat to allow all of the NatXX representations as inputs. A bit like you've done for the special case of array indexing.

crusso · 2026-03-24T13:19:23Z

(labelling this DO-NOT-MERGE for now)

q-uint · 2026-03-24T13:25:39Z

Hi Claudio (@crusso), this was just me going on a wild adventure. My motives are just fun and random perf improvements, and I don't like the number casts personally.

Happy to just close this if it goes against the Motoko design principles, same for the rumored issue #5934. I hope I didn't take up too much time.

crusso · 2026-03-25T14:38:20Z

For perf, I think we could just spot the immediate cast before indexing and do a fastpath.

For convenience, another idea might be to not overload (just) indexing, but allow the compiler to insert NatX->Nat casts (and NatX/IntX -> casts) where appropriate, guided by bi-directional type checking. I can see this getting in the way of tail calls sometimes. Might also be problematic for type argument inference... an application might check with type arguments (and implicit casting), but not without.

## Problem Fork PRs don't receive repository secrets. When `CACHIX_AUTH_TOKEN` is empty, `cachix watch-exec` fails with a `NoSigningKey` error before tests even start, blocking all CI for external contributors. ## Solution Make the `cachix watch-exec` wrapper conditional in the test blueprint action. When the auth token is absent (fork PR), run `nix-build-uncached` directly without the push wrapper. - The `cachix-action` step already handles an empty token gracefully — it configures read-only substituter access via `cachix use`, so cached derivations are still fetched. - Only the push of newly-built derivations is lost, which is acceptable for fork PRs. - The token is passed via an `env` variable (not inline) to avoid leaking it in logs. - No caller changes needed — `${{ secrets.CACHIX_AUTH_TOKEN }}` naturally resolves to empty for fork PRs. Motivated by PR #5929. Made with [Cursor](https://cursor.com)

skilesare · 2026-03-27T16:45:16Z

I'll argue for doing something. This is having a material effect on some things that I'm doing that are currently blowing out the instruction limit. By my calculations, I'm doing over 5.1 million iterations over various different arrays...many of which are small 4-item arrays that could really benefit from a Nat8 loop given @q-uint's numbers above.

I get the KISS stuff, but the reality of the language is that right now it is more performant to do

  var i64 : Nat64 = 0;
  while (i64 <= 7) { // 0..8
      let i = Nat64.toNat(i64);
      t += n[i] + other.n[i];
  };

Than

  var i : Nat = 0;
  while (i64 <= 7) { // 0..8
      t += n[i] + other.n[i];
  };
```

...which seems kind of not simple. At least from the user's perspective.

q-uint · 2026-03-27T17:45:16Z

Claudio's suggestion targets a different scenario than what the PR handles.

My changes make Nat64 (and other NatX) first-class index types. When you write arr[myNat64], codegen sees e2 : Nat64 and emits the dedicated idx_nat64 path directly: unbox, bounds-check, done.

But if someone writes arr[Nat64.toNat(x)], the index expression has type Nat, so codegen hits the plain-Nat branch with the speculative scalar check and potential idx_bigint fallback: even though x was just a Nat64.

The suggestion: pattern-match in codegen to recognize a toNat cast wrapping a NatX, peel it off, and emit the idx_nat64 path using the original x directly. A codegen peephole, no type system changes.

	My PR	Suggestion
Where	Type system + codegen	Codegen only (peephole)
Handles	`arr[myNat64]`	`arr[Nat64.toNat(x)]`
How	New index types accepted	Recognize cast in AST, elide it
Benefit	Best perf (-30-59%), clean types	-19%, optimizes legacy code patterns

crusso · 2026-04-01T15:58:19Z

Benefit Best perf (-30-59%), clean types -19%, optimizes legacy code patterns

@Quint - did you actually try my peephole suggestion? Any idea why it gets less perf back than yours? Did you also apply the peephole for indexing on left of an assignment?

Recognize arr[natNToNat(x)] patterns at codegen time and compile x directly as its native fixed-width type, bypassing the bigint conversion entirely. Handles both direct prim calls and PrimWrapper function calls (e.g. nat64ToNat from the prelude).

q-uint · 2026-04-01T16:50:28Z

Seems like I didn't test it properly.

Benchmark (1000 x 256 accesses, enhanced)

Index type	Before	After	Change
Nat	25,687,306	25,687,306	--
Nat8	11,036,370	11,036,370	--
Nat16	11,079,329	11,079,329	--
Nat32	10,567,324	10,567,324	--
Nat64	17,991,325	17,991,325	--
Nat64.toNat()	20,807,325	17,991,325	-13.5%

arr[nat64ToNat(n)] now compiles to the same code as arr[n] with a native Nat64 index.

Revert all type/interpreter changes that allowed fixed-width Nat types as direct array/blob indices. The peephole optimization in the codegen (unwrap_toNat) achieves the same performance benefit by recognizing arr[natNToNat(x)] patterns and eliding the conversion, without any language-level changes.

q-uint · 2026-04-01T17:03:13Z

The peephole produces identical instruction counts to the native type-change approach:

Index path	Type change	Peephole (toNat)
Nat8	11,036,370	11,036,370
Nat16	11,079,329	11,079,329
Nat32	10,567,324	10,567,324
Nat64	17,991,325	17,991,325

Zero overhead from the toNat() call, it's completely elided at compile time.

crusso · 2026-04-01T17:31:05Z

So maybe that would be good enough, without overloading indexing. Funnly enough, I just started on a PR to do that, but maybe you can extract one that just does the peephole optimization (or I steal some of your code as well as your test ;-))

If so, it's probably worth double checking the optimization still kicks in if we call

arr[Nat64.toNat(n))]
arr[n.toNat()]

crusso · 2026-04-01T19:38:25Z

+            negative Int that somehow got here) falls through to
+            Arr.idx_bigint, which handles the full conversion.
+
+          Both array and index are saved to locals because wasm


I wonder if that's actually true or just a limitation of our E.if_ helper that doesn't let you supply a operand stack type for each branch.

Wasm 1.0 didn't allow the branches to assume the stack is non-empty. But Wasm 2.0 does.
Unfortunately, I'm not entirely sure we are allowed to use Wasm 2.0 features.

@ggreif are we good with Wasm 2.0 instructions?

If so, we don't need new locals and can avoid the stores and loads of get_va and get_vi.

Recognize arr[n.toNat()] patterns at codegen time and elide the conversion, compiling the inner value directly as its native type. Works through prim calls, core library, and method syntax. Remove the speculative scalar fast path for plain Nat indexing and revert type system changes for fixed-width Nat indices.

crusso · 2026-04-01T23:38:27Z

+  | Ir.PrimE (Ir.CallPrim _,
+      [{it = Ir.PrimE (Ir.DotPrim name, [{it = Ir.VarE (_, mod_var); _}]); _}; inner]) ->
+    begin match VarEnv.lookup_var ae mod_var with
+    | Some (VarEnv.Const (Const.Obj fs)) ->


I guess one could even iterate this for deeper projection paths, but probably not worth it.

crusso · 2026-04-02T08:07:55Z

I have hunch this approach still won't catch compound array assignments that combine an operator with update, like
a[i] += 1 . Do we care? I'm not sure I do.

q-uint · 2026-04-02T08:33:18Z

No, a[i] += 1 won't hit the peephole, but it doesn't matter I think? The parser desugars it into let tmp = i; a[tmp] = a[tmp] + 1, so the toNat call binds once into a plain Nat local and both array accesses reuse it.

crusso · 2026-04-02T12:33:29Z

No, a[i] += 1 won't hit the peephole, but it doesn't matter I think? The parser desugars it into let tmp = i; a[tmp] = a[tmp] + 1, so the toNat call binds once into a plain Nat local and both array accesses reuse it.

Also, our desugaring is pretty inefficient duplicating the field/index computation (and bounds check). So we could optimize this way more anyway (but not in this PR). Opened an issue for that.

#5972

crusso · 2026-04-02T12:36:03Z

(if the bench mark fails, I think you need to run make accept to update the numbers to use eop. If you just run ../run.sh -a -p it will run the benchmarks with 32-bit.

Or use

EXTRA_MOC_ARGS=--enhanced-orthogonal-persistence ../run.sh -a -p <test>.mo

q-uint · 2026-04-02T19:05:22Z

The Create GitHub App Token step in test.yml (line 71-76) runs unconditionally, but fork PRs don't have access to vars.GENERIC_CI_RW_APP_ID or secrets.GENERIC_CI_RW_APP_PRIVATE_KEY, causing this failure:

Error: [@octokit/auth-app] appId option is required

The perf step already guards on is_fork (line 86), but the token creation step itself needs a condition too, e.g.:

- name: Create GitHub App Token
  if: github.event.pull_request.head.repo.full_name == github.repository
  uses: actions/create-github-app-token@v3

Without this, every fork PR fails at this step.

@Kamirus

skilesare · 2026-04-03T18:06:31Z

I kind of ran into this again with Array.tabulate...looks like there is a Prim function that does this that has Nat hardcoded as the func paramater...I wonder if there are savings there as well?

q-uint added 5 commits March 22, 2026 16:24

refactor: extract check_array_idx_typ helper in check_ir.ml

d631d4c

refactor: deduplicate index helpers, add trap tests, extend bench

b90e782

q-uint requested a review from a team as a code owner March 22, 2026 16:11

q-uint commented Mar 22, 2026

View reviewed changes

Comment thread src/lang_utils/error_codes.ml Outdated

fix: update M0245 comment to reflect all fixed-width Nat index types

afb1546

Kamirus mentioned this pull request Mar 23, 2026

ci: make Cachix push optional to support fork PRs #5931

Merged

Merge branch 'master' into quint/nat64-array-indexing

68fc428

q-uint added 2 commits March 23, 2026 14:07

fix: add missing tc-human ok files for nat64-index fail test

ec5bc44

fix: update gc-random-test ok files for new M0155 index warning

2d707d6

Nat64 index type checking now infers index sub-expressions, which correctly surfaces M0155 (operator may trap) for Nat subtraction in array indices.

crusso added the DO-NOT-MERGE label Mar 24, 2026

crusso reviewed Apr 1, 2026

View reviewed changes

Comment thread src/codegen/compile_enhanced.ml Outdated

Comment thread test/bench/nat64-index.mo

crusso reviewed Apr 1, 2026

View reviewed changes

Conversation

q-uint commented Mar 22, 2026

Summary

IC instruction counts (1000 iters x 256 elements)

Uh oh!

Uh oh!

q-uint commented Mar 23, 2026

Uh oh!

ggreif commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

q-uint commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kamirus commented Mar 23, 2026

Uh oh!

crusso commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crusso commented Mar 24, 2026

Uh oh!

q-uint commented Mar 24, 2026

Uh oh!

crusso commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skilesare commented Mar 27, 2026

Uh oh!

q-uint commented Mar 27, 2026

Uh oh!

crusso commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

q-uint commented Apr 1, 2026

Benchmark (1000 x 256 accesses, enhanced)

Uh oh!

q-uint commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crusso commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

crusso Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

crusso Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

crusso commented Apr 2, 2026

Uh oh!

q-uint commented Apr 2, 2026

Uh oh!

crusso commented Apr 2, 2026

Uh oh!

crusso commented Apr 2, 2026

Uh oh!

q-uint commented Apr 2, 2026

Uh oh!

skilesare commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ggreif commented Mar 23, 2026 •

edited

Loading

q-uint commented Mar 23, 2026 •

edited

Loading

crusso commented Mar 24, 2026 •

edited

Loading

crusso commented Mar 25, 2026 •

edited

Loading

crusso commented Apr 1, 2026 •

edited

Loading

q-uint commented Apr 1, 2026 •

edited

Loading

crusso commented Apr 1, 2026 •

edited

Loading