feat: support fixed-width Nat types as array and blob indices#5929
feat: support fixed-width Nat types as array and blob indices#5929q-uint wants to merge 13 commits intocaffeinelabs:masterfrom
Conversation
Allow Nat64 values as array indices, bypassing the bignum-to-word64 conversion that Nat indices require. On the IC, this reduces array access instructions by ~30% when the index is already a Nat64. Changes span the type checker (accept Nat64 in index position), both codegen backends (dispatch to direct word64 indexing), the IR checker, and both interpreters.
The IdxE handler in interpret.ml dispatched on Nat64 before matching on Blob vs array, meaning blob[nat64_val] would have succeeded if it reached the interpreter. The type checker already rejects this, so it was unreachable, but the contract was looser than necessary. Move the Nat64 dispatch inside the array-only branch.
Extend indexing to accept Nat8, Nat16, Nat32 in addition to Nat64. All fixed-width Nat types compile to direct word-sized index operations, bypassing bignum conversion entirely. IC instruction counts (1000 iters x 256 elements): Nat (baseline): 25,687,306 Nat8: 11,036,370 (-57%) Nat16: 11,079,329 (-57%) Nat32: 10,567,324 (-59%) Nat64: 17,991,325 (-30%) Nat64+toNat(): 20,807,325 (-19%) Also extends Nat64 indexing to blobs, scopes interpreter Nat64 dispatch to the correct branches, and uses T.sub uniformly in place of the T.sub/T.eq asymmetry.
|
CI is failing because |
|
Just an idea. Normally a small |
|
Tried this out: mask bit 0 (scalar tag) and bit 63 (sign), and if both zero, shift right to recover the raw index and go straight to Benchmark results (256K accesses, enhanced backend):
But maybe I interpreted the suggestion wrong? |
When indexing with plain Nat, the compiler always called idx_bigint, paying for a function call and error message local setup even when the index is a small tagged scalar (the common case). Inline the scalar check at the four indexing sites (array and blob, classical and enhanced): mask the tag and sign bits, and if both zero, shift to recover the raw index and call idx directly, bypassing idx_bigint entirely. Heap bigints fall through to the existing path. Benchmark (enhanced, 256K accesses): Nat index -0.7% instructions.
## Problem
Fork PRs don't receive repository secrets. When `CACHIX_AUTH_TOKEN` is
empty, `cachix watch-exec` fails with a `NoSigningKey` error before
tests even start, blocking all CI for external contributors.
## Solution
Make the `cachix watch-exec` wrapper conditional in the test blueprint
action. When the auth token is absent (fork PR), run
`nix-build-uncached` directly without the push wrapper.
- The `cachix-action` step already handles an empty token gracefully —
it configures read-only substituter access via `cachix use`, so cached
derivations are still fetched.
- Only the push of newly-built derivations is lost, which is acceptable
for fork PRs.
- The token is passed via an `env` variable (not inline) to avoid
leaking it in logs.
- No caller changes needed — `${{ secrets.CACHIX_AUTH_TOKEN }}`
naturally resolves to empty for fork PRs.
Motivated by PR #5929.
Made with [Cursor](https://cursor.com)
Now the CI works with forks as well |
Nat64 index type checking now infers index sub-expressions, which correctly surfaces M0155 (operator may trap) for Nat subtraction in array indices.
|
Hi Quint, Thanks for the PR. It looks like you are essentially overloading array indexing on all unsigned types. However, this seems to fly against the Motoko KISS principle (although maybe that ship sailed long ago), especially not allowing implicit conversions between types. What is the motivation? If it is just perf, not programmer convenience, I think we could easily enough optimize any Would that be enough for you or do you actually want to avoid the user writing those casts? I'm a little worried that flipping from checking against Nat mode to inferring the type of the index might actually break a fair bit of code. If we really want to support this, maybe just adding explict a.sub8() etc prims would be enough for people that want to micro-optimize their code. BTW, I heard a rumour that you want to extend subtyping and allow NatXX<: Nat but that's a non-starter because Motoko's subtyping is by subsumption, not coercion. That is subtyping always compiles to a no-op, not a cast that transforms the value to a new one. To support NatXX <: Nat we would need to ensure the representation of these values is always the same or modify every operation on Nat to allow all of the NatXX representations as inputs. A bit like you've done for the special case of array indexing. |
|
(labelling this DO-NOT-MERGE for now) |
|
Hi Claudio (@crusso), this was just me going on a wild adventure. My motives are just fun and random perf improvements, and I don't like the number casts personally. Happy to just close this if it goes against the Motoko design principles, same for the rumored issue #5934. I hope I didn't take up too much time. |
|
For perf, I think we could just spot the immediate cast before indexing and do a fastpath. For convenience, another idea might be to not overload (just) indexing, but allow the compiler to insert NatX->Nat casts (and NatX/IntX -> casts) where appropriate, guided by bi-directional type checking. I can see this getting in the way of tail calls sometimes. Might also be problematic for type argument inference... an application might check with type arguments (and implicit casting), but not without. |
## Problem
Fork PRs don't receive repository secrets. When `CACHIX_AUTH_TOKEN` is
empty, `cachix watch-exec` fails with a `NoSigningKey` error before
tests even start, blocking all CI for external contributors.
## Solution
Make the `cachix watch-exec` wrapper conditional in the test blueprint
action. When the auth token is absent (fork PR), run
`nix-build-uncached` directly without the push wrapper.
- The `cachix-action` step already handles an empty token gracefully —
it configures read-only substituter access via `cachix use`, so cached
derivations are still fetched.
- Only the push of newly-built derivations is lost, which is acceptable
for fork PRs.
- The token is passed via an `env` variable (not inline) to avoid
leaking it in logs.
- No caller changes needed — `${{ secrets.CACHIX_AUTH_TOKEN }}`
naturally resolves to empty for fork PRs.
Motivated by PR #5929.
Made with [Cursor](https://cursor.com)
|
I'll argue for doing something. This is having a material effect on some things that I'm doing that are currently blowing out the instruction limit. By my calculations, I'm doing over 5.1 million iterations over various different arrays...many of which are small 4-item arrays that could really benefit from a Nat8 loop given @q-uint's numbers above. I get the KISS stuff, but the reality of the language is that right now it is more performant to do Than |
|
Claudio's suggestion targets a different scenario than what the PR handles. My changes make But if someone writes The suggestion: pattern-match in codegen to recognize a
|
@Quint - did you actually try my peephole suggestion? Any idea why it gets less perf back than yours? Did you also apply the peephole for indexing on left of an assignment? |
Recognize arr[natNToNat(x)] patterns at codegen time and compile x directly as its native fixed-width type, bypassing the bigint conversion entirely. Handles both direct prim calls and PrimWrapper function calls (e.g. nat64ToNat from the prelude).
|
Seems like I didn't test it properly. Benchmark (1000 x 256 accesses, enhanced)
|
Revert all type/interpreter changes that allowed fixed-width Nat types as direct array/blob indices. The peephole optimization in the codegen (unwrap_toNat) achieves the same performance benefit by recognizing arr[natNToNat(x)] patterns and eliding the conversion, without any language-level changes.
|
The peephole produces identical instruction counts to the native type-change approach:
Zero overhead from the |
|
So maybe that would be good enough, without overloading indexing. Funnly enough, I just started on a PR to do that, but maybe you can extract one that just does the peephole optimization (or I steal some of your code as well as your test ;-)) If so, it's probably worth double checking the optimization still kicks in if we call arr[Nat64.toNat(n))] |
| negative Int that somehow got here) falls through to | ||
| Arr.idx_bigint, which handles the full conversion. | ||
|
|
||
| Both array and index are saved to locals because wasm |
There was a problem hiding this comment.
I wonder if that's actually true or just a limitation of our E.if_ helper that doesn't let you supply a operand stack type for each branch.
Wasm 1.0 didn't allow the branches to assume the stack is non-empty. But Wasm 2.0 does.
Unfortunately, I'm not entirely sure we are allowed to use Wasm 2.0 features.
@ggreif are we good with Wasm 2.0 instructions?
If so, we don't need new locals and can avoid the stores and loads of get_va and get_vi.
Recognize arr[n.toNat()] patterns at codegen time and elide the conversion, compiling the inner value directly as its native type. Works through prim calls, core library, and method syntax. Remove the speculative scalar fast path for plain Nat indexing and revert type system changes for fixed-width Nat indices.
| | Ir.PrimE (Ir.CallPrim _, | ||
| [{it = Ir.PrimE (Ir.DotPrim name, [{it = Ir.VarE (_, mod_var); _}]); _}; inner]) -> | ||
| begin match VarEnv.lookup_var ae mod_var with | ||
| | Some (VarEnv.Const (Const.Obj fs)) -> |
There was a problem hiding this comment.
I guess one could even iterate this for deeper projection paths, but probably not worth it.
|
I have hunch this approach still won't catch compound array assignments that combine an operator with update, like |
|
No, |
Also, our desugaring is pretty inefficient duplicating the field/index computation (and bounds check). So we could optimize this way more anyway (but not in this PR). Opened an issue for that. |
|
(if the bench mark fails, I think you need to run make accept to update the numbers to use eop. If you just run ../run.sh -a -p it will run the benchmarks with 32-bit. Or use |
|
The The perf step already guards on - name: Create GitHub App Token
if: github.event.pull_request.head.repo.full_name == github.repository
uses: actions/create-github-app-token@v3Without this, every fork PR fails at this step. |
|
I kind of ran into this again with Array.tabulate...looks like there is a Prim function that does this that has Nat hardcoded as the func paramater...I wonder if there are savings there as well? |
Summary
IC instruction counts (1000 iters x 256 elements)