Skip to content

feat: support fixed-width Nat types as array and blob indices#5929

Open
q-uint wants to merge 13 commits intocaffeinelabs:masterfrom
q-uint:quint/nat64-array-indexing
Open

feat: support fixed-width Nat types as array and blob indices#5929
q-uint wants to merge 13 commits intocaffeinelabs:masterfrom
q-uint:quint/nat64-array-indexing

Conversation

@q-uint
Copy link
Copy Markdown

@q-uint q-uint commented Mar 22, 2026

Summary

  • Allow Nat8, Nat16, Nat32, and Nat64 values as array indices (and Nat64 for blobs), bypassing bignum conversion for direct word-sized index operations
  • All fixed-width Nat types compile to direct indexing, reducing IC instruction counts by 30-59% compared to Nat indices
  • Changes span the type checker, both codegen backends, IR checker, both interpreters, and include run/trap/fail/bench tests

IC instruction counts (1000 iters x 256 elements)

Type Instructions vs Nat
Nat (baseline) 25,687,306
Nat8 11,036,370 -57%
Nat16 11,079,329 -57%
Nat32 10,567,324 -59%
Nat64 17,991,325 -30%
Nat64+toNat() 20,807,325 -19%

q-uint added 5 commits March 22, 2026 16:24
Allow Nat64 values as array indices, bypassing the bignum-to-word64
conversion that Nat indices require. On the IC, this reduces array
access instructions by ~30% when the index is already a Nat64.

Changes span the type checker (accept Nat64 in index position), both
codegen backends (dispatch to direct word64 indexing), the IR checker,
and both interpreters.
The IdxE handler in interpret.ml dispatched on Nat64 before matching
on Blob vs array, meaning blob[nat64_val] would have succeeded if it
reached the interpreter. The type checker already rejects this, so it
was unreachable, but the contract was looser than necessary. Move the
Nat64 dispatch inside the array-only branch.
Extend indexing to accept Nat8, Nat16, Nat32 in addition to Nat64.
All fixed-width Nat types compile to direct word-sized index operations,
bypassing bignum conversion entirely.

IC instruction counts (1000 iters x 256 elements):
  Nat (baseline):    25,687,306
  Nat8:              11,036,370  (-57%)
  Nat16:             11,079,329  (-57%)
  Nat32:             10,567,324  (-59%)
  Nat64:             17,991,325  (-30%)
  Nat64+toNat():     20,807,325  (-19%)

Also extends Nat64 indexing to blobs, scopes interpreter Nat64
dispatch to the correct branches, and uses T.sub uniformly in
place of the T.sub/T.eq asymmetry.
@q-uint q-uint requested a review from a team as a code owner March 22, 2026 16:11
Comment thread src/lang_utils/error_codes.ml Outdated
@q-uint
Copy link
Copy Markdown
Author

q-uint commented Mar 23, 2026

CI is failing because cachix watch-exec requires CACHIX_AUTH_TOKEN, which isn't available to workflows triggered from fork PRs? The build/test themselves should be fine, it's just the Cachix authentication step that errors out.

@ggreif
Copy link
Copy Markdown
Contributor

ggreif commented Mar 23, 2026

Just an idea. Normally a small Nat will be used for indexing, and checking for that should be trivial. I.e. masking the highest and lowest bits, and if both zero shift the number by 1/2 to the left. This is the offset from the array's start. Maybe we should run the benchmarks agains this speculative (low-hanging fruit) attempt.

@q-uint
Copy link
Copy Markdown
Author

q-uint commented Mar 23, 2026

Tried this out: mask bit 0 (scalar tag) and bit 63 (sign), and if both zero, shift right to recover the raw index and go straight to Arr.idx, bypassing the idx_bigint -> BigNum.to_word64_with call chain entirely. The catch is that BigNum.to_word64_with (in MakeCompact) already has an inline tagged-scalar fast path: it checks bit 0, branches, and unboxes with a shift. So the speculative check is doing essentially the same work, just one function call closer to the indexing site.

Benchmark results (256K accesses, enhanced backend):

Index type Before After
Nat32 19,538,348 19,538,348
Nat64 23,634,342 23,634,342
Nat 36,968,319 36,712,319 (-0.7%)
Nat64->toNat 29,522,342 29,266,342 (-0.9%)

But maybe I interpreted the suggestion wrong?

4299689

When indexing with plain Nat, the compiler always called idx_bigint,
paying for a function call and error message local setup even when
the index is a small tagged scalar (the common case).

Inline the scalar check at the four indexing sites (array and blob,
classical and enhanced): mask the tag and sign bits, and if both
zero, shift to recover the raw index and call idx directly, bypassing
idx_bigint entirely. Heap bigints fall through to the existing path.

Benchmark (enhanced, 256K accesses): Nat index -0.7% instructions.
github-merge-queue Bot pushed a commit that referenced this pull request Mar 23, 2026
## Problem

Fork PRs don't receive repository secrets. When `CACHIX_AUTH_TOKEN` is
empty, `cachix watch-exec` fails with a `NoSigningKey` error before
tests even start, blocking all CI for external contributors.

## Solution

Make the `cachix watch-exec` wrapper conditional in the test blueprint
action. When the auth token is absent (fork PR), run
`nix-build-uncached` directly without the push wrapper.

- The `cachix-action` step already handles an empty token gracefully —
it configures read-only substituter access via `cachix use`, so cached
derivations are still fetched.
- Only the push of newly-built derivations is lost, which is acceptable
for fork PRs.
- The token is passed via an `env` variable (not inline) to avoid
leaking it in logs.
- No caller changes needed — `${{ secrets.CACHIX_AUTH_TOKEN }}`
naturally resolves to empty for fork PRs.

Motivated by PR #5929.

Made with [Cursor](https://cursor.com)
@Kamirus
Copy link
Copy Markdown
Contributor

Kamirus commented Mar 23, 2026

CI is failing because cachix watch-exec requires CACHIX_AUTH_TOKEN, which isn't available to workflows triggered from fork PRs? The build/test themselves should be fine, it's just the Cachix authentication step that errors out.

Now the CI works with forks as well

q-uint added 2 commits March 23, 2026 14:07
Nat64 index type checking now infers index sub-expressions,
which correctly surfaces M0155 (operator may trap) for Nat
subtraction in array indices.
@crusso
Copy link
Copy Markdown
Contributor

crusso commented Mar 24, 2026

Hi Quint,

Thanks for the PR. It looks like you are essentially overloading array indexing on all unsigned types.

However, this seems to fly against the Motoko KISS principle (although maybe that ship sailed long ago), especially not allowing implicit conversions between types.

What is the motivation? If it is just perf, not programmer convenience, I think we could easily enough optimize any
a.[NatXX.toNat(n)] indexing (in compile_array_index) without all these changes to typing etc.

Would that be enough for you or do you actually want to avoid the user writing those casts?

I'm a little worried that flipping from checking against Nat mode to inferring the type of the index might actually break a fair bit of code.

If we really want to support this, maybe just adding explict a.sub8() etc prims would be enough for people that want to micro-optimize their code.

BTW, I heard a rumour that you want to extend subtyping and allow NatXX<: Nat but that's a non-starter because Motoko's subtyping is by subsumption, not coercion. That is subtyping always compiles to a no-op, not a cast that transforms the value to a new one. To support NatXX <: Nat we would need to ensure the representation of these values is always the same or modify every operation on Nat to allow all of the NatXX representations as inputs. A bit like you've done for the special case of array indexing.

@crusso
Copy link
Copy Markdown
Contributor

crusso commented Mar 24, 2026

(labelling this DO-NOT-MERGE for now)

@q-uint
Copy link
Copy Markdown
Author

q-uint commented Mar 24, 2026

Hi Claudio (@crusso), this was just me going on a wild adventure. My motives are just fun and random perf improvements, and I don't like the number casts personally.

Happy to just close this if it goes against the Motoko design principles, same for the rumored issue #5934. I hope I didn't take up too much time.

@crusso
Copy link
Copy Markdown
Contributor

crusso commented Mar 25, 2026

For perf, I think we could just spot the immediate cast before indexing and do a fastpath.

For convenience, another idea might be to not overload (just) indexing, but allow the compiler to insert NatX->Nat casts (and NatX/IntX -> casts) where appropriate, guided by bi-directional type checking. I can see this getting in the way of tail calls sometimes. Might also be problematic for type argument inference... an application might check with type arguments (and implicit casting), but not without.

ggreif pushed a commit that referenced this pull request Mar 26, 2026
## Problem

Fork PRs don't receive repository secrets. When `CACHIX_AUTH_TOKEN` is
empty, `cachix watch-exec` fails with a `NoSigningKey` error before
tests even start, blocking all CI for external contributors.

## Solution

Make the `cachix watch-exec` wrapper conditional in the test blueprint
action. When the auth token is absent (fork PR), run
`nix-build-uncached` directly without the push wrapper.

- The `cachix-action` step already handles an empty token gracefully —
it configures read-only substituter access via `cachix use`, so cached
derivations are still fetched.
- Only the push of newly-built derivations is lost, which is acceptable
for fork PRs.
- The token is passed via an `env` variable (not inline) to avoid
leaking it in logs.
- No caller changes needed — `${{ secrets.CACHIX_AUTH_TOKEN }}`
naturally resolves to empty for fork PRs.

Motivated by PR #5929.

Made with [Cursor](https://cursor.com)
@skilesare
Copy link
Copy Markdown

I'll argue for doing something. This is having a material effect on some things that I'm doing that are currently blowing out the instruction limit. By my calculations, I'm doing over 5.1 million iterations over various different arrays...many of which are small 4-item arrays that could really benefit from a Nat8 loop given @q-uint's numbers above.

I get the KISS stuff, but the reality of the language is that right now it is more performant to do

  var i64 : Nat64 = 0;
  while (i64 <= 7) { // 0..8
      let i = Nat64.toNat(i64);
      t += n[i] + other.n[i];
  };

Than

  var i : Nat = 0;
  while (i64 <= 7) { // 0..8
      t += n[i] + other.n[i];
  };
```

...which seems kind of not simple. At least from the user's perspective.

@q-uint
Copy link
Copy Markdown
Author

q-uint commented Mar 27, 2026

Claudio's suggestion targets a different scenario than what the PR handles.

My changes make Nat64 (and other NatX) first-class index types. When you write arr[myNat64], codegen sees e2 : Nat64 and emits the dedicated idx_nat64 path directly: unbox, bounds-check, done.

But if someone writes arr[Nat64.toNat(x)], the index expression has type Nat, so codegen hits the plain-Nat branch with the speculative scalar check and potential idx_bigint fallback: even though x was just a Nat64.

The suggestion: pattern-match in codegen to recognize a toNat cast wrapping a NatX, peel it off, and emit the idx_nat64 path using the original x directly. A codegen peephole, no type system changes.

My PR Suggestion
Where Type system + codegen Codegen only (peephole)
Handles arr[myNat64] arr[Nat64.toNat(x)]
How New index types accepted Recognize cast in AST, elide it
Benefit Best perf (-30-59%), clean types -19%, optimizes legacy code patterns

@crusso
Copy link
Copy Markdown
Contributor

crusso commented Apr 1, 2026

Benefit Best perf (-30-59%), clean types -19%, optimizes legacy code patterns

@Quint - did you actually try my peephole suggestion? Any idea why it gets less perf back than yours? Did you also apply the peephole for indexing on left of an assignment?

Recognize arr[natNToNat(x)] patterns at codegen time and compile
x directly as its native fixed-width type, bypassing the bigint
conversion entirely. Handles both direct prim calls and PrimWrapper
function calls (e.g. nat64ToNat from the prelude).
@q-uint
Copy link
Copy Markdown
Author

q-uint commented Apr 1, 2026

Seems like I didn't test it properly.

Benchmark (1000 x 256 accesses, enhanced)

Index type Before After Change
Nat 25,687,306 25,687,306 --
Nat8 11,036,370 11,036,370 --
Nat16 11,079,329 11,079,329 --
Nat32 10,567,324 10,567,324 --
Nat64 17,991,325 17,991,325 --
Nat64.toNat() 20,807,325 17,991,325 -13.5%

arr[nat64ToNat(n)] now compiles to the same code as arr[n] with a native Nat64 index.

Revert all type/interpreter changes that allowed fixed-width Nat
types as direct array/blob indices. The peephole optimization in
the codegen (unwrap_toNat) achieves the same performance benefit
by recognizing arr[natNToNat(x)] patterns and eliding the
conversion, without any language-level changes.
@q-uint
Copy link
Copy Markdown
Author

q-uint commented Apr 1, 2026

The peephole produces identical instruction counts to the native type-change approach:

Index path Type change Peephole (toNat)
Nat8 11,036,370 11,036,370
Nat16 11,079,329 11,079,329
Nat32 10,567,324 10,567,324
Nat64 17,991,325 17,991,325

Zero overhead from the toNat() call, it's completely elided at compile time.

@crusso
Copy link
Copy Markdown
Contributor

crusso commented Apr 1, 2026

So maybe that would be good enough, without overloading indexing. Funnly enough, I just started on a PR to do that, but maybe you can extract one that just does the peephole optimization (or I steal some of your code as well as your test ;-))

If so, it's probably worth double checking the optimization still kicks in if we call

arr[Nat64.toNat(n))]
arr[n.toNat()]

Comment thread src/codegen/compile_enhanced.ml Outdated
Comment thread test/bench/nat64-index.mo
Comment thread src/codegen/compile_classical.ml Outdated
negative Int that somehow got here) falls through to
Arr.idx_bigint, which handles the full conversion.

Both array and index are saved to locals because wasm
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if that's actually true or just a limitation of our E.if_ helper that doesn't let you supply a operand stack type for each branch.

Wasm 1.0 didn't allow the branches to assume the stack is non-empty. But Wasm 2.0 does.
Unfortunately, I'm not entirely sure we are allowed to use Wasm 2.0 features.

@ggreif are we good with Wasm 2.0 instructions?

If so, we don't need new locals and can avoid the stores and loads of get_va and get_vi.

Recognize arr[n.toNat()] patterns at codegen time and elide the
conversion, compiling the inner value directly as its native type.
Works through prim calls, core library, and method syntax.

Remove the speculative scalar fast path for plain Nat indexing
and revert type system changes for fixed-width Nat indices.
| Ir.PrimE (Ir.CallPrim _,
[{it = Ir.PrimE (Ir.DotPrim name, [{it = Ir.VarE (_, mod_var); _}]); _}; inner]) ->
begin match VarEnv.lookup_var ae mod_var with
| Some (VarEnv.Const (Const.Obj fs)) ->
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one could even iterate this for deeper projection paths, but probably not worth it.

@crusso
Copy link
Copy Markdown
Contributor

crusso commented Apr 2, 2026

I have hunch this approach still won't catch compound array assignments that combine an operator with update, like
a[i] += 1 . Do we care? I'm not sure I do.

@q-uint
Copy link
Copy Markdown
Author

q-uint commented Apr 2, 2026

No, a[i] += 1 won't hit the peephole, but it doesn't matter I think? The parser desugars it into let tmp = i; a[tmp] = a[tmp] + 1, so the toNat call binds once into a plain Nat local and both array accesses reuse it.

@crusso
Copy link
Copy Markdown
Contributor

crusso commented Apr 2, 2026

No, a[i] += 1 won't hit the peephole, but it doesn't matter I think? The parser desugars it into let tmp = i; a[tmp] = a[tmp] + 1, so the toNat call binds once into a plain Nat local and both array accesses reuse it.

Also, our desugaring is pretty inefficient duplicating the field/index computation (and bounds check). So we could optimize this way more anyway (but not in this PR). Opened an issue for that.

#5972

@crusso
Copy link
Copy Markdown
Contributor

crusso commented Apr 2, 2026

(if the bench mark fails, I think you need to run make accept to update the numbers to use eop. If you just run ../run.sh -a -p it will run the benchmarks with 32-bit.

Or use

EXTRA_MOC_ARGS=--enhanced-orthogonal-persistence ../run.sh -a -p <test>.mo

@q-uint
Copy link
Copy Markdown
Author

q-uint commented Apr 2, 2026

The Create GitHub App Token step in test.yml (line 71-76) runs unconditionally, but fork PRs don't have access to vars.GENERIC_CI_RW_APP_ID or secrets.GENERIC_CI_RW_APP_PRIVATE_KEY, causing this failure:

Error: [@octokit/auth-app] appId option is required

The perf step already guards on is_fork (line 86), but the token creation step itself needs a condition too, e.g.:

- name: Create GitHub App Token
  if: github.event.pull_request.head.repo.full_name == github.repository
  uses: actions/create-github-app-token@v3

Without this, every fork PR fails at this step.

@Kamirus

@skilesare
Copy link
Copy Markdown

I kind of ran into this again with Array.tabulate...looks like there is a Prim function that does this that has Nat hardcoded as the func paramater...I wonder if there are savings there as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants