Skip to content

chore: parallelise RTS test variants#6029

Draft
ggreif wants to merge 12 commits intomasterfrom
gabor/rts-parallel-tests
Draft

chore: parallelise RTS test variants#6029
ggreif wants to merge 12 commits intomasterfrom
gabor/rts-parallel-tests

Conversation

@ggreif
Copy link
Copy Markdown
Contributor

@ggreif ggreif commented Apr 18, 2026

Summary

Parallelise the RTS test suite for significant wall clock reduction.

Phase A: Variant-level parallelism

  • Separate CARGO_TARGET_DIR per variant (target-<name>) to avoid cargo lock contention
  • define/eval Makefile template generates build + per-module run targets
  • make -j8 test in nix checkPhase

Phase B: Per-module parallelism via wasmtime --invoke

  • Each test module gets a #[no_mangle] pub extern "C" fn test_<mod>() entry point
  • Makefile runs wasmtime --invoke test_<mod> per module — works on wasm64-unknown-unknown without WASI
  • 3 variants × 24 modules = 72 parallel targets

Phase C: GC seed chunking

  • Split 100 GC random seeds into 10 chunks of 10 (test_gc_chunk_0..9)
  • Separate gc_predefined (hand-crafted heaps) from gc_components (incremental/compacting internals)
  • Split persistence into persistence_small (up to 10k objects) and persistence_20k (20k objects)
  • Heavy tests ordered first in TEST_MODULES so make -j starts long poles early

Dynamic test seeds

  • Stabilization small tests use a seed derived from git rev-parse HEAD at build time
  • Each commit tests different random heap configurations automatically
  • Fallback to fixed seed 4711 when not in a git repo (nix sandbox)
  • The heavy persistence_20k test uses fixed seed 4711 for predictable CI runtime

Bug fix: heap size scaling

  • heap_size_for_gc for incremental GC ignored total_heap_size_bytes, always returning 3 * PARTITION_SIZE (192 MB)
  • For seeds generating large object graphs, the dynamic heap exceeded this fixed size
  • Fix: max(3 * PARTITION_SIZE, 2 * total_heap_size_bytes)
  • Discovered via seed 20_000 which generates a dense 20k-object graph

Other improvements

  • test -f guard before wasmtime to fail fast if cargo didn't produce the binary
  • WASMTIME_BACKTRACE_DETAILS=1 for better crash diagnostics
  • Removed unnecessary unsafe blocks in test entry points

Observed speedup: from 2+ hours sequential to ~25 minutes parallel on macOS (limited by persistence_20k — Amdahl's law).

Test plan

  • CI green

🤖 Generated with Claude Code

@ggreif ggreif requested a review from a team as a code owner April 18, 2026 08:34
@ggreif ggreif self-assigned this Apr 18, 2026
@ggreif ggreif added the testing Related to test suite label Apr 18, 2026
Use CARGO_TARGET_DIR per test variant (target-ni, target-inc, target-64)
to avoid cargo lock contention, enabling `make -j3 test` to build and
run all three RTS test variants in parallel.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ggreif ggreif force-pushed the gabor/rts-parallel-tests branch from 6479860 to 4509ad2 Compare April 18, 2026 09:04
Factor out the repeated test build/run pattern into a reusable
test_variant macro. The cargo target dir is derived from the make
target name (target-<name>).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ggreif ggreif force-pushed the gabor/rts-parallel-tests branch from 4509ad2 to 61dee13 Compare April 18, 2026 09:15
@ggreif ggreif changed the title perf: parallelise RTS test variants chore: parallelise RTS test variants Apr 18, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 18, 2026

Comparing from f996dfa to a6f03f7:
The produced WebAssembly code seems to be completely unchanged.
In terms of gas, no changes are observed in 5 tests.
In terms of size, no changes are observed in 5 tests.

ggreif and others added 10 commits April 18, 2026 12:41
- make -j3 → make -j: the number of test variants is the natural limit
- test -f on the wasm binary before wasmtime: fail fast if cargo didn't
  produce the binary (wasmtime may return 0 on missing file)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add RTS_TEST_FILTER via wasmtime --invoke for per-module entry points
- Split 100 GC random seeds into 10 chunks of 10 (test_gc_chunk_0..9)
- Separate gc_predefined (hand-crafted heaps + components) from random seeds
- 3 variants × 21 modules = 63 parallel wasmtime targets
- Trace markers (>>> <<<) for build diagnostics

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Separate gc_predefined (3 hand-crafted heaps) from gc_components
  (incremental/compacting/generational internal tests)
- Split persistence into persistence_small (up to 10k objects) and
  persistence_20k (the heavy 20k serialization test)
- Order TEST_MODULES heaviest-first so make -j starts long poles early
- Make incremental GC sub-modules public for per-component entry points

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Seed 20_000 caused a slice_index_fail in heap construction.
Use the same seed as the other stabilization tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Unlimited -j with 72 parallel wasmtime targets can exhaust memory
on CI runners. Cap at 8 concurrent processes as a safe default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CI failure was likely OOM from unbounded parallelism, not the seed.
With -j8 cap, seed 20_000 should work. Remove >>> <<< debug traces.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
heap_size_for_gc ignored total_heap_size_bytes, always returning
3*PARTITION_SIZE (192 MB). For seeds that generate large object graphs
(e.g. seed 20_000 with 20k objects), the dynamic heap exceeds this
fixed size, causing slice_index_fail in create_dynamic_heap.

Fix: use max(3*PARTITION_SIZE, 2*total_heap_size_bytes) so the heap
grows to fit the actual content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Vary the RNG seed across commits so different random heaps are tested
over time. Seed is derived from git rev-parse HEAD at build time,
with fallback to "4711" when not in a git repo (nix sandbox).

Also enable WASMTIME_BACKTRACE_DETAILS=1 for better crash diagnostics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gc::*, stable_option::test(), and stabilization sub-tests are safe
functions — no unsafe block needed. Also run cargo fmt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… only

The 20k test is too expensive to risk worst-case seeds. Keep it
deterministic with a known-good seed. Small tests vary per commit
to explore different heap shapes over time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ggreif ggreif force-pushed the gabor/rts-parallel-tests branch from 63e23f9 to a6f03f7 Compare April 18, 2026 16:20
@alexandru-uta
Copy link
Copy Markdown
Contributor

What are the added benefits? Is it really faster, how much?

@ggreif
Copy link
Copy Markdown
Contributor Author

ggreif commented Apr 20, 2026

What are the added benefits? Is it really faster, how much?

It is still dominated by the 20000-tree, but at least this one now runs in parallel to the others. I was fed up with the slowness on the Mac, so this might help. But I haven't done A/B testing yet.

The other thing is that this introduces different rand seeds per 10000-tree. The fixed seed is kept for the big one for less surprises in run time.

@ggreif ggreif marked this pull request as draft April 20, 2026 08:10
@ggreif
Copy link
Copy Markdown
Contributor Author

ggreif commented Apr 20, 2026

Keeping this as draft, as I am brainstorming how the bottleneck can be improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Related to test suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants