chore: parallelise RTS test variants by ggreif · Pull Request #6029 · caffeinelabs/motoko

ggreif · 2026-04-18T08:34:57Z

Summary

Parallelise the RTS test suite for significant wall clock reduction.

Phase A: Variant-level parallelism

Separate CARGO_TARGET_DIR per variant (target-<name>) to avoid cargo lock contention
define/eval Makefile template generates build + per-module run targets
make -j8 test in nix checkPhase

Phase B: Per-module parallelism via `wasmtime --invoke`

Each test module gets a #[no_mangle] pub extern "C" fn test_<mod>() entry point
Makefile runs wasmtime --invoke test_<mod> per module — works on wasm64-unknown-unknown without WASI
3 variants × 24 modules = 72 parallel targets

Phase C: GC seed chunking

Split 100 GC random seeds into 10 chunks of 10 (test_gc_chunk_0..9)
Separate gc_predefined (hand-crafted heaps) from gc_components (incremental/compacting internals)
Split persistence into persistence_small (up to 10k objects) and persistence_20k (20k objects)
Heavy tests ordered first in TEST_MODULES so make -j starts long poles early

Dynamic test seeds

Stabilization small tests use a seed derived from git rev-parse HEAD at build time
Each commit tests different random heap configurations automatically
Fallback to fixed seed 4711 when not in a git repo (nix sandbox)
The heavy persistence_20k test uses fixed seed 4711 for predictable CI runtime

Bug fix: heap size scaling

heap_size_for_gc for incremental GC ignored total_heap_size_bytes, always returning 3 * PARTITION_SIZE (192 MB)
For seeds generating large object graphs, the dynamic heap exceeded this fixed size
Fix: max(3 * PARTITION_SIZE, 2 * total_heap_size_bytes)
Discovered via seed 20_000 which generates a dense 20k-object graph

Other improvements

test -f guard before wasmtime to fail fast if cargo didn't produce the binary
WASMTIME_BACKTRACE_DETAILS=1 for better crash diagnostics
Removed unnecessary unsafe blocks in test entry points

Observed speedup: from 2+ hours sequential to ~25 minutes parallel on macOS (limited by persistence_20k — Amdahl's law).

Test plan

CI green

🤖 Generated with Claude Code

Use CARGO_TARGET_DIR per test variant (target-ni, target-inc, target-64) to avoid cargo lock contention, enabling `make -j3 test` to build and run all three RTS test variants in parallel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Factor out the repeated test build/run pattern into a reusable test_variant macro. The cargo target dir is derived from the make target name (target-<name>). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-18T09:49:56Z

Comparing from f996dfa to a6f03f7:
The produced WebAssembly code seems to be completely unchanged.
In terms of gas, no changes are observed in 5 tests.
In terms of size, no changes are observed in 5 tests.

- make -j3 → make -j: the number of test variants is the natural limit - test -f on the wasm binary before wasmtime: fail fast if cargo didn't produce the binary (wasmtime may return 0 on missing file) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add RTS_TEST_FILTER via wasmtime --invoke for per-module entry points - Split 100 GC random seeds into 10 chunks of 10 (test_gc_chunk_0..9) - Separate gc_predefined (hand-crafted heaps + components) from random seeds - 3 variants × 21 modules = 63 parallel wasmtime targets - Trace markers (>>> <<<) for build diagnostics Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Separate gc_predefined (3 hand-crafted heaps) from gc_components (incremental/compacting/generational internal tests) - Split persistence into persistence_small (up to 10k objects) and persistence_20k (the heavy 20k serialization test) - Order TEST_MODULES heaviest-first so make -j starts long poles early - Make incremental GC sub-modules public for per-component entry points Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Seed 20_000 caused a slice_index_fail in heap construction. Use the same seed as the other stabilization tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Unlimited -j with 72 parallel wasmtime targets can exhaust memory on CI runners. Cap at 8 concurrent processes as a safe default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The CI failure was likely OOM from unbounded parallelism, not the seed. With -j8 cap, seed 20_000 should work. Remove >>> <<< debug traces. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

heap_size_for_gc ignored total_heap_size_bytes, always returning 3*PARTITION_SIZE (192 MB). For seeds that generate large object graphs (e.g. seed 20_000 with 20k objects), the dynamic heap exceeds this fixed size, causing slice_index_fail in create_dynamic_heap. Fix: use max(3*PARTITION_SIZE, 2*total_heap_size_bytes) so the heap grows to fit the actual content. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Vary the RNG seed across commits so different random heaps are tested over time. Seed is derived from git rev-parse HEAD at build time, with fallback to "4711" when not in a git repo (nix sandbox). Also enable WASMTIME_BACKTRACE_DETAILS=1 for better crash diagnostics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gc::*, stable_option::test(), and stabilization sub-tests are safe functions — no unsafe block needed. Also run cargo fmt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… only The 20k test is too expensive to risk worst-case seeds. Keep it deterministic with a known-good seed. Small tests vary per commit to explore different heap shapes over time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

alexandru-uta · 2026-04-20T07:02:22Z

What are the added benefits? Is it really faster, how much?

ggreif · 2026-04-20T07:51:57Z

What are the added benefits? Is it really faster, how much?

It is still dominated by the 20000-tree, but at least this one now runs in parallel to the others. I was fed up with the slowness on the Mac, so this might help. But I haven't done A/B testing yet.

The other thing is that this introduces different rand seeds per 10000-tree. The fixed seed is kept for the big one for less surprises in run time.

ggreif · 2026-04-20T08:12:39Z

Keeping this as draft, as I am brainstorming how the bottleneck can be improved.

ggreif requested a review from a team as a code owner April 18, 2026 08:34

ggreif self-assigned this Apr 18, 2026

ggreif added the testing Related to test suite label Apr 18, 2026

ggreif force-pushed the gabor/rts-parallel-tests branch from 6479860 to 4509ad2 Compare April 18, 2026 09:04

refactor: use define/eval template for RTS test variants

61dee13

Factor out the repeated test build/run pattern into a reusable test_variant macro. The cargo target dir is derived from the make target name (target-<name>). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ggreif force-pushed the gabor/rts-parallel-tests branch from 4509ad2 to 61dee13 Compare April 18, 2026 09:15

ggreif changed the title ~~perf: parallelise RTS test variants~~ chore: parallelise RTS test variants Apr 18, 2026

ggreif mentioned this pull request Apr 18, 2026

perf: RTS test64 GC tests allocate 192MB per seed (should be much smaller) #6030

Closed

ggreif and others added 10 commits April 18, 2026 12:41

fix: use known-good seed 4711 for persistence_20k

2a0e9b9

Seed 20_000 caused a slice_index_fail in heap construction. Use the same seed as the other stabilization tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: cap test parallelism at -j8 to avoid OOM on CI

007971a

Unlimited -j with 72 parallel wasmtime targets can exhaust memory on CI runners. Cap at 8 concurrent processes as a safe default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: revert seed to 20_000, remove trace markers

3d8bfe3

The CI failure was likely OOM from unbounded parallelism, not the seed. With -j8 cap, seed 20_000 should work. Remove >>> <<< debug traces. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: remove unnecessary unsafe blocks, cargo fmt

73c9bb1

gc::*, stable_option::test(), and stabilization sub-tests are safe functions — no unsafe block needed. Also run cargo fmt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ggreif force-pushed the gabor/rts-parallel-tests branch from 63e23f9 to a6f03f7 Compare April 18, 2026 16:20

ggreif marked this pull request as draft April 20, 2026 08:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: parallelise RTS test variants#6029

chore: parallelise RTS test variants#6029
ggreif wants to merge 12 commits intomasterfrom
gabor/rts-parallel-tests

ggreif commented Apr 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 18, 2026 •

edited

Loading

Uh oh!

alexandru-uta commented Apr 20, 2026

Uh oh!

ggreif commented Apr 20, 2026

Uh oh!

ggreif commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ggreif commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Phase A: Variant-level parallelism

Phase B: Per-module parallelism via wasmtime --invoke

Phase C: GC seed chunking

Dynamic test seeds

Bug fix: heap size scaling

Other improvements

Test plan

Uh oh!

github-actions Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexandru-uta commented Apr 20, 2026

Uh oh!

ggreif commented Apr 20, 2026

Uh oh!

ggreif commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggreif commented Apr 18, 2026 •

edited

Loading

Phase B: Per-module parallelism via `wasmtime --invoke`

github-actions Bot commented Apr 18, 2026 •

edited

Loading