Architectural decisions for ClojureWasm. Reference by searching ## D##.
Only architectural decisions (new Value variant, subsystem design, etc.) — not bug fixes.
Pruned 2026-02-08: removed 61 historical/superseded/implementation-detail entries. See git history for full archive.
Decision: VM is an explicit struct instance passed as parameter. No global or threadlocal state anywhere.
Rationale (.dev/future.md SS15.5):
- Beta used 8 threadlocal variables in defs.zig, making embedding impossible
- Instantiated VM enables: multiple VMs in one process, library embedding mode, clean testing (each test gets fresh VM)
Known exceptions: macro_eval_env (D15), predicates.current_env (T9.5.5), bootstrap.last_thrown_exception, keyword_intern.table, collections._vec_gen_counter (24C.4), lifecycle.shutdown_requested/hooks (34.5), http_server.build_mode/background_mode/bg_server (34.2)
Decision: Implement TreeWalk evaluator alongside VM from Phase 2. Wire --compare mode immediately.
Rationale (.dev/future.md SS9.2):
- Beta's --compare mode was "the most effective bug-finding tool"
- TreeWalk is simpler to implement correctly (direct Node -> Value)
- VM bugs often produce wrong values silently (not crashes)
Development rule (enforced from Phase 3 onward):
When adding any new feature (builtin, special form, operator), implement it
in both backends and add an EvalEngine.compare() test.
| Component | Path (pre-R8) | Path (post-R8) |
|---|---|---|
| VM | src/vm/vm.zig |
src/engine/vm/vm.zig |
| TreeWalk | src/evaluator/tree_walk.zig |
src/engine/evaluator/tree_walk.zig |
| EvalEngine | src/runtime/eval_engine.zig |
src/engine/eval_engine.zig |
Decision: All source code, comments, commit messages, PR descriptions, and documentation are in English.
Rationale: OSS readiness from day one. Beta used Japanese comments/commits, which limited accessibility. Agent response language is personal preference (configured in ~/.claude/CLAUDE.md).
Decision: The / operator returns Ratio for non-exact int division, matching JVM.
Ratio type implemented (F3 resolved).
Clojure JVM: (/ 6 3) → 2 (Long), (/ 1 3) → 1/3 (Ratio).
ClojureWasm: (/ 6 3) → 2.0 (float), (/ 1 3) → 0.333... (float).
When to implement Ratio: When tests fail due to precision loss from float approximation.
Decision: Single callFnVal(allocator, fn_val, args) function in bootstrap.zig.
Routes by Value tag and Fn.kind:
builtin_fn→ direct callfn_val(.bytecode)→ bytecodeCallBridge (creates new VM instance)fn_val(.treewalk)→ treewalkCallBridge (creates new TreeWalk)multi_fn,keyword,map,set→ IFn dispatch
All call sites import bootstrap.callFnVal directly (no callback fields/module vars).
Decision: Replace contiguous capture_base + capture_count with
capture_slots: []const u16 in FnProto. Each slot index is recorded
individually, allowing capture from arbitrary non-contiguous stack positions.
Rationale: Contiguous capture failed when locals occupied non-contiguous stack slots (e.g., self-ref at slot 0, let binding at slot 2, nothing at slot 1).
Decision: Transducer support via 1-arity map/filter, extended conj/deref,
and transduce using plain reduce (not protocol-based coll-reduce).
Key functions: transduce, into (3-arity), cat, halt-when, dedupe,
preserving-reduced, sequence (1-arity).
halt-when uses :__halt instead of ::halt (auto-qualified keywords not supported).
Decision: Threadlocal error state (same pattern as Beta). Module-level
functions setError(), setErrorFmt(), getLastError(), setSourceText().
Rationale: Instance-based ErrorContext (D3a) caused error info loss — context lived on evalString()'s stack, out of scope when errors propagated to main(). Threadlocal eliminates the scope boundary problem. Single-threaded execution means no thread safety concerns.
Decision: Core seq functions (map, filter, take, take-while, concat, range,
mapcat) use lazy-seq/cons in core.clj. realizeValue() in collections.zig
handles transparent lazy→eager conversion at system boundaries.
Realize boundaries: eqFn/neqFn, VM .eq/.neq opcodes, print/pr/println/prn, str/pr-str, valueToForm, withMetaFn.
Decision: Capture defining namespace on Fn objects and restore during
function calls. Unqualified symbol resolution happens in the defining namespace.
value.zig:Fn.defining_ns: ?[]const u8vm.zig:CallFrame.saved_nssaves/restoresenv.current_nstree_walk.zig:makeClosure/callClosuresave/restore namespace
Rationale: JVM Clojure captures Var references at compile time. Our runtime-resolved approach caused cross-namespace shadowing.
Decision: MarkSweepGc in src/common/gc.zig using HashMap-based
allocation tracking (keyed by pointer address).
- Provides
std.mem.Allocatorinterface (alloc/resize/remap/free vtable) - Provides
GcStrategyinterface (alloc/collect/shouldCollect/stats vtable) - HashMap uses backing allocator (not GC allocator) to avoid circular dependency
- Allocation threshold controls
shouldCollect()trigger
Decision: Three allocator tiers:
- GPA (infra_alloc): Env, Namespace, Var, HashMap backings — stable infrastructure
- node_arena (GPA-backed ArenaAllocator in Env): Reader Forms, Analyzer Nodes — AST data referenced by TreeWalk closures, persists for program lifetime
- GC allocator (gc_alloc): Values (Fn, collections, strings) — mark-sweep collected
Rationale: GC sweep frees ALL unmarked allocations. AST Nodes are not Values and cannot be traced by the GC.
Decision: Always heap-allocate VM structs (via allocator.create(VM)).
The VM struct is ~1.5MB (NaN-boxed: ~256KB) due to fixed-size operand stack.
Stack-allocated VMs cause native stack overflow in nested calls.
Decision: Two-phase bootstrap in loadCore:
- Phase 1: Evaluate core.clj via TreeWalk (fast startup, all functions defined)
- Phase 2: Re-evaluate hot transducer functions (map, filter, comp) via VM compiler, replacing TreeWalk closures with bytecode closures.
evalStringVMBootstrap: Compiles via Compiler+VM, does NOT deinit — FnProtos must persist because they are stored in Vars.
Trade-off: transduce 2134→15ms (142x), startup +5ms.
Decision: Flatten nested filter chains + reuse active VM in callFnVal.
-
Filter chain collapsing (value.zig):
lazy_filter_chainMeta variant stores flat[]const Valueof predicates + source. Avoids 168 levels of recursive realize() for sieve-like programs. -
Active VM call bridge (bootstrap.zig): callFnVal checks
vm_mod.active_vmbefore allocating a new VM. Eliminates ~500KB heap allocation per call.
Result: sieve 1645→21ms (78x), memory 2997→24MB (125x).
Decision: Two Value variants for Wasm FFI:
wasm_module: *WasmModule— heap-allocated, owns Store/Module/Instancewasm_fn: *const WasmFn— bound export name + signature, callable via callFnVal
Namespace: cljw.wasm (D82), registered in registry.zig.
Type conversion: integer↔i32/i64, float↔f32/f64, boolean/nil→i32(0/1).
Decision: Global trampoline + context table (256 slots) for host function injection.
(wasm/load "m.wasm" {:imports {"env" {"log" clj-fn}}})registers Clojure fnsHostContextstores: Clojure fn Value, param/result counts, allocator- Single
hostTrampoline(vm, ctx_id)handles all callbacks
Rationale: Context table (vs closures) because Zig closures cannot be passed as fn pointers.
Decision: Defer wasm_rt implementation. Pivot to native production track.
Rationale:
- WasmGC: LLVM cannot emit WasmGC types, no timeline
- Wasmtime GC: Cycle collection unimplemented
- WASI Threads: Specification in flux
- Native track has immediate high-value opportunities
Consequence: wasm_rt deferred until ecosystem matures.
See src/wasm_rt/README.md for revival conditions.
Decision: nREPL uses GPA directly for all allocations — both Env (persistent) and evalString (transient). No ArenaAllocator.
Rationale: ArenaAllocator.free() in Zig 0.15.2 performs "last allocation rollback" optimization. When persistent data (Vars) and transient data share the same arena, free/alloc cycles for transient data can overwrite persistent allocations.
Decision: Generate bootstrap cache at Zig build time, embed as binary data.
User-facing paths: cljw file.clj (run) and cljw build file.clj -o app (single binary).
- registerBuiltins() at startup (Zig function pointers not serializable)
- restoreFromBootstrapCache (replaces loadBootstrapAll)
- Full runtime always included in built binaries
Result: ~6x faster startup (~12ms → ~2ms).
Decision: Two-prefix convention (Babashka model):
clojure.*— JVM Clojure-compatible namespacescljw.*— ClojureWasm-unique extensions (cljw.wasm, cljw.http, cljw.build)user— Default namespace
clojure.java.* names kept for compatibility (matches Babashka's approach).
Decision: cljw.http namespace with Ring-compatible handler model.
- Blocking mode (default):
run-serverruns accept loop in calling thread - Background mode (with
--nrepl): spawns background thread, returns immediately - Build mode: returns nil during
cljw buildto prevent blocking - Threading: Thread per connection with mutex on handler call
Decision: Custom Wasm runtime in src/wasm/runtime/ replacing zware.
- Switch-based dispatch — works on all Zig backends (cross-compilation)
- Direct bytecode execution — no intermediate representation
- Wasm MVP + WASI Preview 1 — ~200 opcodes + SIMD (236 opcodes), 19 WASI functions
Scope: ~5300 LOC, 8 files. Zero external dependencies.
Decision: 4-heap-tag NaN boxing scheme for Value representation (8 bytes).
Encoding (top 16 bits of u64):
< 0xFFF9: float (raw f64 bits)0xFFF9: integer (48-bit signed)0xFFFB: constant (nil, true, false)0xFFFC: char (u21 codepoint)0xFFFD: builtin function pointer0xFFF8/0xFFFA/0xFFFE/0xFFFF: heap pointers (3-bit sub-type + 45-bit shifted address)
28 heap types across 4 tags. 8-byte alignment shift (addr >> 3) gives 48-bit effective address range. Negative NaN canonicalized to positive NaN.
Supersedes: D72 (original NaN boxing with 40-bit address, deferred).
Decision: Three targeted optimizations for switch-based Wasm interpreter:
- VM reuse (36.7A): Cache
VminWasmModule,reset()per invoke - Branch target precomputation (36.7B): Lazy sidetable in
WasmFunction.branch_table - Memory/local optimization (36.7C): Abandoned — ROI too low
Results (hyperfine, ReleaseSafe):
| Benchmark | Before | After | Speedup |
|---|---|---|---|
| wasm_call | 931ms | 118ms | 7.9x |
| wasm_fib | 11046ms | 7663ms | 1.44x |
| wasm_memory | 192ms | 26ms | 7.4x |
| wasm_sieve | 822ms | 792ms | 1.04x |
Resolved: Register IR implemented in zwasm. LEB128 predecode and bytecode fusion done (Phase 37/45).
Decision: Compile hot integer arithmetic loops to native ARM64 machine code at runtime. Interpreter-integrated, single-loop cache, automatic deopt.
Architecture:
- Detection: Back-edge counter in
vmRecurLoop. Threshold = 64 iterations. - Compilation:
jit.zig—analyzeLoopextracts loop ops,compileLoopemits ARM64. Supported ops: branch_ne/ge/gt (locals/const), add/sub (locals/const), recur_loop. - NaN-box integration: SBFX unbox at entry, AND+ORR re-box at exit.
used_slotsbitset: only loads/checks slots referenced by loop body (skips closure self-ref). - THEN path skip:
analyzeLoopusesexit_offsetfrom data word to jump past exit code, only analyzing the ELSE path (loop body). - Execution: W^X transition (mmap WRITE → mprotect READ|EXEC),
sys_icache_invalidate. - JitState per VM: Single cached loop.
maxInt(u32)sentinel prevents retry after deopt. - Platform: ARM64 only (
comptimecheck onbuiltin.cpu.arch == .aarch64). No-op on other architectures.
Results (hyperfine, ReleaseSafe, Apple M4 Pro):
| Benchmark | Before (37.3) | After (37.4) | Speedup |
|---|---|---|---|
| arith_loop | 31ms | 3ms | 10.3x |
| fib_recursive | 16ms | 16ms | 1.0x |
| (cumulative) | 53ms (base) | 3ms | 17.7x |
Scope limitation: PoC targets simple integer loops only. Not compiled: function calls, heap allocation, string ops, collection ops. fib_recursive uses recursion (not loop), so JIT does not apply.
Decision: Add call_target_frame field to VM to prevent exception handlers from
dispatching across VM/TreeWalk bridge boundaries.
Problem: When execution crosses VM→TW→VM boundaries (e.g. run-tests → do-testing
→ TW closure → derive throws), throw_ex dispatches to the nearest handler regardless
of call boundary. This causes an outer scope's try/finally handler (from binding in
do-testing) to intercept exceptions meant for inner scope's try/catch (from TW's
thrown?).
Architecture:
call_target_frame: usizeon VM — set bycallFunctionto currentframe_countthrow_ex: only dispatch to handler ifhandler.saved_frame_count > call_target_frameexecuteUntil: same scope check before error handler dispatchcallFunction:errdeferrestoressp,frame_count, andcurrent_nson error propagation, preventing stale frames from corrupting subsequent calls
Companion fix: Deferred var_ref resolution in bootstrap cache. var_ref constants
(e.g. (var *testing-contexts*)) are serialized with ns/var names but cannot be resolved
during readFnProtoTable (vars don't exist yet). Deferred fixup list resolves them after
restoreEnvState.
Files: src/native/vm/jit.zig (new, ~700 lines), src/native/vm/vm.zig (JitState integration).
Decision: Reserve NanHeapTag slots 29 (big_int), 30 (ratio+big_decimal), 31 (array) in Group D for four Value types needed by Phase 43 (Numeric Types + Arrays).
Types:
- ZigArray: Mutable typed container (
items: []Value,element_type: ElementType). ElementType enum: object, int, long, float, double, boolean, byte, short, char. Equivalent to JVM'sObject[]/int[]etc. Identity equality (mutable). - BigInt: Arbitrary precision integer backed by
std.math.big.int.Managed. Structural equality viaConst.eql(). Printed as<digits>N. - Ratio: Exact rational as numerator/denominator BigInt pair.
Structural equality. Printed as
<num>/<den>. - BigDecimal: Scaled BigInt (unscaled × 10^(-scale)). Shares NanHeapTag slot 30
with Ratio via
NumericExtKinddiscriminator enum(u8) as first field of bothextern structs. Printed as<digits>M.
GC: Array traces all items. BigInt marks struct only (limbs managed by allocator). Ratio marks struct + numerator/denominator BigInt pointers. BigDecimal marks struct + unscaled BigInt pointer.
Files: src/common/value.zig, src/common/collections.zig, src/common/gc.zig,
src/common/builtin/array.zig (new).
Decision: Defer full Wasm interpreter optimization to post-alpha. The recommended approach for future work is predecoded IR + tail-call threaded dispatch.
Research findings (Phase 44.5):
- Current: switch-based dispatch, inline LEB128 decode, lazy HashMap branch table
- Baseline: wasm_fib 7539ms, wasm_sieve 782ms, wasm_call 121ms
- Zig 0.15.2 supports
@call(.always_tail, handler, ...)— verified working - Recommended approach: predecode bytecode → fixed-width IR (8 bytes/instr), then threaded dispatch via function pointer table + tail calls
- Expected impact: 40-60% improvement (2-3x for fib)
- Effort: HIGH (3177-line vm.zig, 200+ opcodes, control flow complexity)
Why defer: Alpha release priorities are correctness and documentation. The Clojure execution speed is already competitive (19/20 wins vs Babashka). Wasm speed is aspirational — users care about Clojure code speed first.
Post-alpha plan: Predecoded IR (eliminates LEB128 + bounds checks) → tail-call dispatch (eliminates branch misprediction) → superinstructions (fuse common patterns).
Decision: Restructure src/ from legacy common/native/ two-tier layout to pipeline-oriented structure where each compilation stage is a top-level directory.
Before: src/common/ (Reader, Analyzer, Compiler, Builtins, Value all mixed),
src/native/ (just VM + TreeWalk). Pipeline structure invisible from outside.
After:
src/
reader/ → Stage 1: Source → Form
analyzer/ → Stage 2: Form → Node
compiler/ → Stage 3: Node → Bytecode (was bytecode/)
vm/ → Stage 4a: Bytecode → Value
evaluator/ → Stage 4b: Node → Value (TreeWalk)
runtime/ → Core types + lifecycle (was common/ loose files)
builtins/ → Built-in functions (was common/builtin/)
regex/ → Regex engine
repl/ → nREPL + REPL (unchanged)
wasm/ → WebAssembly runtime (flattened from wasm/runtime/)
Merges: strings+clj_string → strings, io+file_io+java_io → io, arithmetic+numeric → arithmetic. 70 → 66 files.
Rationale: OSS release visibility. New contributors can see the compilation pipeline from the directory listing. The common/native split was a wasm_rt-era artifact with no current meaning.
Decision: Replace CW's internal wasm engine (9 files, ~9300 LOC) with zwasm
as a GitHub URL dependency (v0.1.0, https://github.com/clojurewasm/zwasm).
CW keeps a thin bridge file (src/wasm/types.zig) that wraps zwasm's public API
into CW's Value system.
Before: CW had a frozen copy of the wasm runtime (vm, store, module, instance,
opcode, predecode, memory, leb128, wasi) in src/wasm/. This was the Phase 35W
engine, missing Register IR, ARM64 JIT, and post-Phase 45 optimizations.
After:
src/wasm/
types.zig → Bridge: delegates to zwasm.WasmModule, keeps Value↔u64 marshalling
builtins.zig → Unchanged (imports from types.zig)
wit_parser.zig → Unchanged (CW-specific WIT handling)
Bridge design: WasmModule.inner: *zwasm.WasmModule delegation pattern.
Host function trampoline uses zwasm.Vm for stack access, zwasm.inspectImportFunctions
for import type resolution. The bridge handles Value↔u64 conversion, HostContext,
and Clojure imports map → []zwasm.ImportEntry translation.
Build: build.zig.zon GitHub URL dependency (v0.1.0 tag tarball).
zig build auto-fetches zwasm. Native targets only (wasm32-wasi does not link zwasm).
Benefits:
- -9300 LOC in CW (maintenance burden eliminated)
- CW automatically inherits zwasm improvements (Register IR, JIT, spec compliance)
- zwasm remains fully independent (no CW-specific code)
zwasm API additions (generic, not CW-specific):
pub const Vm— re-export for embedder host function accessinspectImportFunctions()— pre-analysis utility for import type metadata
Decision: Implement case* as a proper special form across the full pipeline
(Analyzer → Node → Compiler + TreeWalk), replacing the previous cond-based case
macro with the upstream case*/hash-dispatch design.
Node type: CaseNode (expr, shift, mask, default, clauses, test_type, skip_check).
Three test types: :int (integer identity), :hash-equiv (hash + equality),
:hash-identity (hash + identity for interned types like keywords).
Compiler: Equality-check chain — for each clause: dup expr, load constant,
eq, conditional jump. O(n) but correct. Future: switch to table jump for :compact.
TreeWalk: Hash-based dispatch — compute shift-masked hash, scan clauses for match, optional skip-check for hash collision buckets.
case macro: Ported from upstream. Uses prep-ints/prep-hashes to compute
optimal shift/mask parameters. Helper functions: shift-mask, maybe-min-hash,
case-map, fits-table?, prep-ints, merge-hash-collisions, prep-hashes.
Also fixed: Vector destructuring (makeNthCall) now uses 3-arity nth with
nil default, matching Clojure's behavior of returning nil for missing positions
instead of throwing.
Decision: Make MarkSweepGc thread-safe via a single gc_mutex that serializes
all allocation (msAlloc/msFree/msResize/msRemap) and collection (collectIfNeeded,
gcCollect = traceRoots + sweep) paths.
Design: Global GC lock approach — simplest correct implementation. The mutex is held for the entire mark+sweep cycle, preventing allocation during collection (stop-the-world). Multiple threads serialize on the mutex for allocation.
Thread registry: ThreadRegistry tracks active mutator thread count via
atomic counter. Infrastructure for future safe-point integration — when a thread
triggers collection, it will signal others to pause at safe points, wait for
all to reach safe points, then collect with combined root sets.
Scope: Phase 48.2 adds the mutex + registry. Thread spawning (48.3) will integrate safe-point coordination. Future optimization: concurrent marking, thread-local allocation buffers (TLABs), generational collection.
Problem: After D81 (bootstrap cache), Protocol and ProtocolFn values were not
serializable. restoreFromBootstrapCache called reloadProtocolNamespaces to
re-evaluate protocols.clj + reducers.clj (~440 lines) via TreeWalk at every startup,
causing 23.3ms startup time and 226MB memory usage.
Decision: Serialize Protocol and ProtocolFn values in the bootstrap cache. Protocol stores name + method_sigs + impls (nested map of type_key → method_map). ProtocolFn stores method_name + protocol var reference (ns + name), resolved via deferred fixup after env restore. Fn closure_bindings also serialized.
Cache invalidation: Protocol gains a generation counter, incremented on every
extend_type_method / extend-type call. ProtocolFn inline cache checks
cached_generation == protocol.generation to detect stale entries. This fixes a
latent bug where VM-compiled reify forms share compile-time type keys, causing the
monomorphic cache to return stale methods when the same type key gets new impls.
Also fixed extend_type_method to replace existing methods (same name) in the
method map instead of always appending.
Result: Startup 23.3ms → 5.3ms (4.4x), memory 226MB → 8.1MB (28x reduction). All upstream tests pass, no regression.
Problem: User-defined lazy-seq thunks (non-Meta path) consumed VM call frames proportional to nesting depth. VM FRAMES_MAX was 256, limiting thunk-based lazy-seq chains to ~200 levels.
Decision:
- Increase VM FRAMES_MAX from 256 to 1024 (4x headroom)
- Add iterative unwrapping in
LazySeq.realize: when a thunk returns another lazy-seq, loop instead of recursing (matches JVM LazySeq.seq() pattern) - Keep TreeWalk MAX_CALL_DEPTH at 512 (Zig stack limited in Debug builds)
Depth limits after D96:
- Built-in map/filter/take (Meta path): effectively unlimited (Zig stack only)
- User-defined lazy-seq via VM: ~1000 levels
- User-defined lazy-seq via TreeWalk: ~500 levels
- D74 filter chain collapsing: unlimited nested filters (sieve)
Trade-off: VM struct grows ~78KB (frames array). No measurable impact on binary size, startup, RSS, or benchmarks.
Problem: JVM Clojure's syntax-quote resolves unqualified symbols to fully
qualified names at read time (e.g., \foobecomesmy.ns/foo`). CW's reader
did not do this, causing macro expansions to produce unqualified symbols that
failed when evaluated in different namespaces.
Decision: Reader gets current_ns field. expandSyntaxQuote qualifies
unqualified symbols using current_ns (except special forms and auto-gensyms).
Scope: reader.zig (new field + qualifier logic), eval.zig (passes current ns to reader), bootstrap.zig (readFormsWithNs helper).
Problem: spec.alpha (~500 LOC) added to eager bootstrap increased startup from 4.2ms to 5.9ms, exceeding the 5ms threshold.
Decision: spec.alpha and spec.gen.alpha are embedded in the binary but NOT
loaded at startup. Instead, first (require '[clojure.spec.alpha :as s])
triggers loading via loadEmbeddedLib fallback in ns_ops.loadLib.
Trade-off: First require of spec.alpha has a ~1-2ms cost (one-time).
Startup stays at baseline (4.1ms). Binary still embeds the source (~unchanged
size since cache excludes spec.alpha serialization).
Problem: Sequential destructuring [a b & r] used nth for positional
access. This fails on maps (which are seqable but don't support nth),
breaking s/keys which uses [[k v] & ks :as keys] patterns on maps.
Decision: When & is present in a sequential destructuring pattern,
use seq/first/next chain instead of nth. Each (next seq_ref) gets
a fresh local variable slot to avoid stale references. Without &, the
original nth-based path is preserved for efficiency.
Matches JVM: Clojure's destructure uses seq/first/next when & is
present. The CW analyzer now does the same.
Problem: Macro expansion via syntax-quote generates lazy sequences (concat/list*). During VM execution of the macro function, GC at safe points can sweep Values captured in lazy-seq thunk closures. After macro return, valueToForm encounters dangling pointers in the result tree.
Root causes identified (6 fixes):
- valueToForm didn't copy GC-allocated string data to node_arena
- ProtocolFn/MultiFn inline caches not traced by GC
- refer() stored GC-allocated symbol name pointers
- Protocol dispatch created 3 temporary HeapStrings per cache miss
- During macro callFnVal, lazy-seq closure-captured Values swept by GC
- During valueToForm, lazy-seq realization triggers GC while result tree unrooted
Decision: Suppress GC collection during the entire expandMacro scope
(callFnVal + valueToForm). Added suppress_count to MarkSweepGc.
Collection is deferred, not skipped — next safe point after unsuppress
will collect normally. Macro expansion allocations are bounded (result
tree size), so memory pressure is acceptable.
Additionally: getByStringKey on PersistentArrayMap eliminates all temporary HeapString allocations in protocol dispatch hot path.
Problem: Java interop code was scattered across analyzer.zig (rewrite tables), strings.zig (javaMethodFn), predicates.zig, arithmetic.zig, system.zig, and registry.zig. Adding a new Java class required changes in 3+ files. URI, File, and UUID classes were needed for library compatibility (hiccup, web apps, scripts).
Decision: Extract interop into a dedicated src/interop/ module:
src/interop/
rewrites.zig -- Static field + method rewrite tables
dispatch.zig -- Instance method dispatch (__java-method)
constructors.zig -- Constructor dispatch (__interop-new)
classes/
uri.zig -- java.net.URI
file.zig -- java.io.File
uuid.zig -- java.util.UUID
Object model: Java class instances = PersistentArrayMap with
:__reify_type keyword key (e.g., {:__reify_type "java.util.UUID" :uuid "550e8400-..."}). No new Value tags needed. Works with GC (maps
are traced). type builtin reads :__reify_type. str delegates to
.toString() for class instances. prn prints tagged literals
(e.g., #uuid "..." for UUID).
Constructor syntax: (ClassName. args...) and (new ClassName args...)
are analyzer rewrites to (__interop-new "fqcn" args...). :import
stores FQCN: (:import (java.net URI)) → (def URI 'java.net.URI).
Adding new classes: 1 new file in classes/ + 1 registration in
constructors.zig + dispatch.zig + rewrites.zig. Down from 3+ files.
Problem: readFnProtoTable in serialize.zig assigned an uninitialized
pointer array to self.fn_protos before populating it. When a FnProto's
constant pool contained a fn_val referencing a higher-indexed proto
(forward reference), the fn_val got an uninitialized proto pointer
(0xaaaaaaaaaaaaaaaa). This caused GC crashes under heavy allocation
pressure when tracing fn_val chains. Triggered by cl-format adding ~50
closures to the pprint namespace.
Fix: Two-pass deserialization. Pass 1: pre-allocate all FnProto structs and populate the pointer array. Pass 2: deserialize content into pre-allocated structs. All proto pointers are now valid before any constant pool deserialization begins.
Decision: Add -Dwasm=false compile-time feature flag to exclude zwasm
dependency entirely, producing a smaller binary for users who don't need Wasm FFI.
Implementation: Zig comptime branching in 8 files (types.zig, builtins.zig,
wit_parser.zig, registry.zig, main.zig, deps.zig, root.zig, nrepl.zig). When
enable_wasm=false, zwasm is never @import'd (lazy analysis ensures no linker
dependency). Value enum tags (wasm_module=26, wasm_fn=27) are retained for
serialization compatibility. DCE handles unreachable dispatch paths in vm.zig,
tree_walk.zig, gc.zig automatically.
Result: Default 4.25MB → wasm=false 3.68MB (-570KB, -13%).
Decision: Defer deserialization of non-essential namespaces from startup to
require time. Only clojure.core, clojure.core.protocols, and user are
restored eagerly. The remaining 12 eager namespaces (walk, template, test, set,
data, repl, java.shell, java.io, pprint, stacktrace, zip, core.reducers) are
recorded as deferred entries with byte offsets into the bootstrap cache.
Cache format: Unchanged binary format. The Deserializer scans each NS at
startup but only fully restores essential ones — non-essential ones are skipped
via skipNamespaceData() (which parses binary structure to advance the offset)
and their start positions recorded in a module-level deferred_ns_entries map.
Key design choices:
- Module-level globals for deferred state (Deserializer is stack-local, goes out of scope)
@embedFiledata has static lifetime — no copy needed for deferred reads- FnProto table fully deserialized at startup (shared across all NS); unresolvable
var_refs moved to
global_deferred_var_refslist viaresolveOrDeferVarRefs() - Recursive dependency resolution:
restoreDeferredNscallsrestoreFromDeferredCachewhen a refer/alias target NS is itself deferred. Entry removed before call to prevent cycles. resolveGlobalDeferredRefs()called after each deferred NS restoration to resolve newly-available var_refs and protocol fns.
Result: Startup 5.2ms → 4.6ms (-12%). RSS 9.3MB → 7.4MB (-20%).
Note: D104 will be removed in Phase 83E when all core NS are Zig builtins and bytecode deserialization is eliminated entirely.
Decision: Major architectural evolution across 5 phases (83A-83E).
Full design: .dev/interop-v2-design.md.
Motivation: D101 Java InterOp works but has structural issues: fragmented registration (5+ files per class), silent nil on unknown methods, Exception. returning raw string, byte-level string ops, handle safety gaps. Additionally, .clj bootstrap adds startup cost and CLJW marker maintenance.
Changes:
-
Exception Unification (83A):
(Exception. "msg")→ map with:__ex_info. Comptime exception hierarchy table.isSubclassOffor catch dispatch. Unknown methods → error. -
ClassDef Registry (83B): Single
ClassDefstruct per Java class. One registry, consulted by analyzer + dispatcher + instance? + constructors. Protocol-based method dispatch. Method Missing → error. Supersedes D101's 5-file registration pattern. -
UTF-8 Codepoint (83C): String index operations use codepoint semantics. Internal UTF-8 representation unchanged.
std.unicode.Utf8Iteratorfor indexing. -
Handle Safety (83D): Closed flag, use-after-close detection, GC finalization.
-
All-Zig Core (83E): All standard-library functions → Zig builtins. .clj loading reserved for user code and libraries only. Eliminates bytecode deserialization (D104 becomes unnecessary), CLJW markers, VM interpretation overhead for core functions.
Object model: Unchanged — class instances remain PersistentArrayMap with
:__reify_type. The change is in how classes are defined and dispatched.
Migration invariant: All tests pass after every sub-task. Incremental, defensive migration with benchmarks recorded at milestones.
Decision: Support :extend-via-metadata true on defprotocol. When set,
protocol dispatch checks (meta obj) for FQ symbol key (e.g. ns/method-name)
BEFORE the inline cache and impls map lookup.
Motivation: JVM Clojure feature used by Datafiable and Navigable in
clojure.core.protocols. Without it, these protocols can't be extended via
metadata, breaking upstream compatibility.
Implementation:
DefProtocolNodeandProtocolgainextend_via_metadata: boolanddefining_ns: ?[]const u8- Analyzer parses
:extend-via-metadata truekeyword option indefprotocol - Compiler encodes flag as first element of sigs vector
- Dispatch order (both backends): metadata → inline cache → impls → "Object" fallback
- Metadata dispatch is per-object (not cached) — two maps with same type can have different metadata, so the type-based inline cache must be bypassed
protocols.cljDatafiable/Navigable re-enabled with:extend-via-metadata true
Decision: Self-describing NamespaceDef struct + generic registration/loading.
Motivation: 10 concrete problems — 480 lines of copy-pasted registration boilerplate, 10 hand-written loadXxx() functions, loadEmbeddedLib() string-comparison if-chain, mixed naming conventions, split namespace responsibility.
Implementation:
NamespaceDefstruct inregistry.zig: name, builtins, macro_builtins, dynamic_vars, constant_vars, loading (pure_zig/eager_eval/lazy), embedded_source, extra_refers, extra_aliases, post_register, enabledregisterNamespace(): generic function replacing 20+ copy-pasted blockslib/defs.zig: aggregates all library NamespaceDef entriesinline for (all_namespace_defs)loop in registerBuiltins()ns_loader.zig: genericloadNamespaceClj()+loadLazyNamespace()replaces hand-written loadXxx and loadEmbeddedLib if-chain- 30 lib/*.zig files (one per non-core namespace)
- ~470 lines of boilerplate removed from registry.zig + bootstrap.zig
Status: Phases R1-R3 + R6 complete. R4 (core/ file moves) and R5 (requireLib extraction) deferred — purely organizational, no functional impact.
Decision: CW is a pure Zig Clojure runtime. No self-hosting philosophy. All namespace implementations converge to self-contained Zig modules.
Principles:
- Zero embedded Clojure: Eliminate all evalString bootstrap. No .clj in
the processing pipeline. LoadStrategy
eager_eval→ extinct. All vars are Zig builtins or Zig-registered macros. - 1 NS = 1 File: Each lib/.zig contains both NamespaceDef AND full implementation. No separate ns_.zig files. Single touch-point per NS.
- Upstream mapping: Clear Clojure NS/var → Zig file:function mapping. Enables fast upstream change tracking.
- Behavioral compat, not structural compat: Upstream .clj implementations are reference for behavior, not structure. Zig implementations may differ internally for performance, binary size, or startup optimization.
- No self-hosting: Unlike JVM Clojure, CW does not define Clojure in Clojure. .clj files are for user code only.
Motivation: Embedded Clojure strings are fragile (no type checking, no tooling, startup cost from evalString). The ns_.zig + lib/.zig split creates redundant touch-points. Pure Zig enables comptime optimization, dead code elimination, and minimal binary size.
Migration path (incremental, regression-safe):
- Merge ns_.zig → lib/.zig (1NS=1File consolidation)
- Phase B.15-B.16: Convert remaining embedded Clojure to Zig
- Phase C: Eliminate bootstrap pipeline
- Phase E: Optimize (binary size, startup, benchmarks)
Status: In progress. 18 ns_.zig files to merge into lib/.zig.
Decision: Strict 4-zone layered architecture with enforced dependency direction.
Zones:
Layer 0: src/runtime/ — foundational types (Value, collections, Env, GC, Namespace, Var)
NO imports from engine/, lang/, or app/
Layer 1: src/engine/ — processing pipeline (Reader, Analyzer, Compiler, VM, TreeWalk)
imports runtime/ only
Layer 2: src/lang/ — Clojure language (builtins, interop, lib namespaces)
imports runtime/ + engine/
Layer 3: src/app/ — application (main, CLI, REPL, deps, Wasm)
imports anything
Core technique: callFnVal vtable (function pointer table in runtime/dispatch.zig).
Eliminates runtime/ → engine/ dependency (bootstrap.zig imported TreeWalk + VM).
Engine/ sets function pointers at startup. Runtime/ calls through vtable.
Motivation: CW's runtime/ had 112 upward imports (95 to builtins/, 2 to evaluator/, 5 to compiler/, 2 to vm/, 4 to analyzer/, 4 to reader/) — all caused by bootstrap.zig (3,624 LOC God object) and eval_engine.zig (2,556 LOC). NextClojureWasm's strict 3-zone model demonstrated that layered architecture prevents circular dependencies and makes refactoring tractable.
Plan: .dev/refactoring-plan.md (sub-tasks R0-R12).
Rules: .claude/rules/zone-deps.md (auto-loads on src/ edits).
Result targets: 0 upward imports, bootstrap.zig < 200 LOC, main.zig < 200 LOC.
Date: 2026-03-08 Status: Future (depends on zwasm D128) Decision: When zwasm implements allocator injection (D128), CW will pass its own GC-managed allocator to zwasm instead of letting zwasm use an internal Arena. This eliminates the dual-GC lifecycle mismatch.
Current problem: CW GC (Mark-Sweep) manages wasm Value objects (wasm_module, wasm_fn, wasm_instance). When CW GC sweeps these, it frees the CW-side wrapper, but zwasm's internal Arena retains the underlying memory. The Arena only frees on full deinit (process exit), so long-running CW processes that load/unload Wasm modules will leak zwasm-side memory.
Target state: CW passes its allocator to zwasm.Engine.init(cw_allocator).
zwasm allocations become CW GC-visible. When CW GC sweeps a wasm Value, the
underlying zwasm memory is also reclaimable.
Scope: Only zwasm's internal bookkeeping (module metadata, function tables, instance state). Wasm linear memory remains separately managed per spec.
Migration: Minimal CW changes — update wasm_types.zig to pass allocator
at Engine construction. Requires zwasm D128 to be implemented first.
Related: zwasm D128, cw-new D13.
Date: 2026-04-27
Status: Done
Decision: Migrate the entire ClojureWasm tree from Zig 0.15.2 to 0.16.0,
together with bumping zwasm to v1.11.0 (the first 0.16-compatible tag).
Centralize the new std.Io model behind a process-wide accessor module
runtime/io_default.zig so existing module-level mutexes, time helpers,
env lookups, and sleeps don't have to thread io through every call site.
Why now: Zig 0.16 reshapes std.Io (Mutex/Condition/sleep/Timestamp
all take io: Io), removes std.fs.cwd (replaced by std.Io.Dir), removes
std.posix.{getenv,write,isatty}, and changes pub fn main() to
pub fn main(init: std.process.Init). Staying on 0.15.2 indefinitely
forfeits stdlib improvements and forces zwasm to maintain a parallel branch.
Approach:
- zwasm-first vs detach-then-reattach: chose to upgrade zwasm to v1.11.0 from the start (rejected the original "detach + Phase 6 reattach" plan). Reason: v1.11.0 is already 0.16-ready, so keeping zwasm in saved a whole reattach phase and let wasm e2e/bridge tests stay green throughout.
- io_default module: production entry points (main, cache_gen) call
io_default.set(init.io)at startup, so all module-level mutexes / Condition variables / nanoTimestamp / sleep / getenv pick up the real cancelable io. Tests fall through to a process-widestd.Io.Threaded.init_single_threadeddefault, except for the few that need real spawn semantics (shell tests) which install a local Threaded. - libc linkage: zwasm v1.11.0 enables
link_libc = trueby default (D135 in zwasm). CW inherits the libc-linked binary; we use std.c.getenv / std.c.realpath / std.c.write / std.c.mprotect / std.c.getcwd in places where stdlib equivalents were removed. Stripping libc back out is a follow-up (F##; cf. zwasm's W46 sequence). - temporary stubs: HTTP server, nREPL, fancy line editor, and
cljw buildrely onstd.net/std.posix.poll/ raw-mode termios /std.fs.selfExePath— all gone or reshaped in 0.16. The full rewrite tostd.Io.net+ Smith fuzzing is non-trivial and was scoped out of this migration. Each is stubbed with a clear runtime error and tracked as a separate F## item.
Verification: 1324/1324 unit tests, 83/83 cljw test namespaces, 6/6 wasm
e2e, deps.edn e2e all green on macOS aarch64. Bench history records
pre-zig-016 and post-zig-016 entries; no individual benchmark regressed
beyond noise; lazy_chain actually improved.
Related: zwasm D135 (Vm.io infra), Phase 7 follow-ups in .dev/checklist.md.