Skip to content

Latest commit

 

History

History
986 lines (710 loc) · 43.5 KB

File metadata and controls

986 lines (710 loc) · 43.5 KB

Design Decisions

Architectural decisions for ClojureWasm. Reference by searching ## D##. Only architectural decisions (new Value variant, subsystem design, etc.) — not bug fixes.

Pruned 2026-02-08: removed 61 historical/superseded/implementation-detail entries. See git history for full archive.


D3: Instantiated VM — No Threadlocal from Day One

Decision: VM is an explicit struct instance passed as parameter. No global or threadlocal state anywhere.

Rationale (.dev/future.md SS15.5):

  • Beta used 8 threadlocal variables in defs.zig, making embedding impossible
  • Instantiated VM enables: multiple VMs in one process, library embedding mode, clean testing (each test gets fresh VM)

Known exceptions: macro_eval_env (D15), predicates.current_env (T9.5.5), bootstrap.last_thrown_exception, keyword_intern.table, collections._vec_gen_counter (24C.4), lifecycle.shutdown_requested/hooks (34.5), http_server.build_mode/background_mode/bg_server (34.2)


D6: Dual Backend with --compare from Phase 2

Decision: Implement TreeWalk evaluator alongside VM from Phase 2. Wire --compare mode immediately.

Rationale (.dev/future.md SS9.2):

  • Beta's --compare mode was "the most effective bug-finding tool"
  • TreeWalk is simpler to implement correctly (direct Node -> Value)
  • VM bugs often produce wrong values silently (not crashes)

Development rule (enforced from Phase 3 onward): When adding any new feature (builtin, special form, operator), implement it in both backends and add an EvalEngine.compare() test.

Component Path (pre-R8) Path (post-R8)
VM src/vm/vm.zig src/engine/vm/vm.zig
TreeWalk src/evaluator/tree_walk.zig src/engine/evaluator/tree_walk.zig
EvalEngine src/runtime/eval_engine.zig src/engine/eval_engine.zig

D10: English-Only Codebase

Decision: All source code, comments, commit messages, PR descriptions, and documentation are in English.

Rationale: OSS readiness from day one. Beta used Japanese comments/commits, which limited accessibility. Agent response language is personal preference (configured in ~/.claude/CLAUDE.md).


D12: Division Semantics — Float Now, Ratio Later

Decision: The / operator returns Ratio for non-exact int division, matching JVM. Ratio type implemented (F3 resolved).

Clojure JVM: (/ 6 3)2 (Long), (/ 1 3)1/3 (Ratio). ClojureWasm: (/ 6 3)2.0 (float), (/ 1 3)0.333... (float).

When to implement Ratio: When tests fail due to precision loss from float approximation.


D36: Unified fn_val dispatch via callFnVal

Decision: Single callFnVal(allocator, fn_val, args) function in bootstrap.zig. Routes by Value tag and Fn.kind:

  • builtin_fn → direct call
  • fn_val(.bytecode) → bytecodeCallBridge (creates new VM instance)
  • fn_val(.treewalk) → treewalkCallBridge (creates new TreeWalk)
  • multi_fn, keyword, map, set → IFn dispatch

All call sites import bootstrap.callFnVal directly (no callback fields/module vars).


D56: VM Closure Capture — Per-Slot Array

Decision: Replace contiguous capture_base + capture_count with capture_slots: []const u16 in FnProto. Each slot index is recorded individually, allowing capture from arbitrary non-contiguous stack positions.

Rationale: Contiguous capture failed when locals occupied non-contiguous stack slots (e.g., self-ref at slot 0, let binding at slot 2, nothing at slot 1).


D62: Transducer Foundation

Decision: Transducer support via 1-arity map/filter, extended conj/deref, and transduce using plain reduce (not protocol-based coll-reduce).

Key functions: transduce, into (3-arity), cat, halt-when, dedupe, preserving-reduced, sequence (1-arity).

halt-when uses :__halt instead of ::halt (auto-qualified keywords not supported).


D63: Error System — Threadlocal (Supersedes D3a)

Decision: Threadlocal error state (same pattern as Beta). Module-level functions setError(), setErrorFmt(), getLastError(), setSourceText().

Rationale: Instance-based ErrorContext (D3a) caused error info loss — context lived on evalString()'s stack, out of scope when errors propagated to main(). Threadlocal eliminates the scope boundary problem. Single-threaded execution means no thread safety concerns.


D65: Lazy Sequence Infrastructure

Decision: Core seq functions (map, filter, take, take-while, concat, range, mapcat) use lazy-seq/cons in core.clj. realizeValue() in collections.zig handles transparent lazy→eager conversion at system boundaries.

Realize boundaries: eqFn/neqFn, VM .eq/.neq opcodes, print/pr/println/prn, str/pr-str, valueToForm, withMetaFn.


D68: Namespace-Isolated Function Execution

Decision: Capture defining namespace on Fn objects and restore during function calls. Unqualified symbol resolution happens in the defining namespace.

  • value.zig: Fn.defining_ns: ?[]const u8
  • vm.zig: CallFrame.saved_ns saves/restores env.current_ns
  • tree_walk.zig: makeClosure/callClosure save/restore namespace

Rationale: JVM Clojure captures Var references at compile time. Our runtime-resolved approach caused cross-namespace shadowing.


D69: Mark-Sweep GC Allocator (Phase 23)

Decision: MarkSweepGc in src/common/gc.zig using HashMap-based allocation tracking (keyed by pointer address).

  • Provides std.mem.Allocator interface (alloc/resize/remap/free vtable)
  • Provides GcStrategy interface (alloc/collect/shouldCollect/stats vtable)
  • HashMap uses backing allocator (not GC allocator) to avoid circular dependency
  • Allocation threshold controls shouldCollect() trigger

D70: Three-Allocator Architecture (Phase 23.5)

Decision: Three allocator tiers:

  • GPA (infra_alloc): Env, Namespace, Var, HashMap backings — stable infrastructure
  • node_arena (GPA-backed ArenaAllocator in Env): Reader Forms, Analyzer Nodes — AST data referenced by TreeWalk closures, persists for program lifetime
  • GC allocator (gc_alloc): Values (Fn, collections, strings) — mark-sweep collected

Rationale: GC sweep frees ALL unmarked allocations. AST Nodes are not Values and cannot be traced by the GC.


D71: Heap-Allocated VM Struct

Decision: Always heap-allocate VM structs (via allocator.create(VM)). The VM struct is ~1.5MB (NaN-boxed: ~256KB) due to fixed-size operand stack. Stack-allocated VMs cause native stack overflow in nested calls.


D73: Two-Phase Bootstrap — TreeWalk + VM Hot Recompilation

Decision: Two-phase bootstrap in loadCore:

  1. Phase 1: Evaluate core.clj via TreeWalk (fast startup, all functions defined)
  2. Phase 2: Re-evaluate hot transducer functions (map, filter, comp) via VM compiler, replacing TreeWalk closures with bytecode closures.

evalStringVMBootstrap: Compiles via Compiler+VM, does NOT deinit — FnProtos must persist because they are stored in Vars.

Trade-off: transduce 2134→15ms (142x), startup +5ms.


D74: Filter Chain Collapsing + Active VM Call Bridge

Decision: Flatten nested filter chains + reuse active VM in callFnVal.

  1. Filter chain collapsing (value.zig): lazy_filter_chain Meta variant stores flat []const Value of predicates + source. Avoids 168 levels of recursive realize() for sieve-like programs.

  2. Active VM call bridge (bootstrap.zig): callFnVal checks vm_mod.active_vm before allocating a new VM. Eliminates ~500KB heap allocation per call.

Result: sieve 1645→21ms (78x), memory 2997→24MB (125x).


D76: Wasm InterOp Value Variants — wasm_module + wasm_fn

Decision: Two Value variants for Wasm FFI:

  • wasm_module: *WasmModule — heap-allocated, owns Store/Module/Instance
  • wasm_fn: *const WasmFn — bound export name + signature, callable via callFnVal

Namespace: cljw.wasm (D82), registered in registry.zig.

Type conversion: integer↔i32/i64, float↔f32/f64, boolean/nil→i32(0/1).


D77: Host Function Injection — Clojure→Wasm Callbacks

Decision: Global trampoline + context table (256 slots) for host function injection.

  • (wasm/load "m.wasm" {:imports {"env" {"log" clj-fn}}}) registers Clojure fns
  • HostContext stores: Clojure fn Value, param/result counts, allocator
  • Single hostTrampoline(vm, ctx_id) handles all callbacks

Rationale: Context table (vs closures) because Zig closures cannot be passed as fn pointers.


D79: Strategic Pivot — Native Production Track

Decision: Defer wasm_rt implementation. Pivot to native production track.

Rationale:

  • WasmGC: LLVM cannot emit WasmGC types, no timeline
  • Wasmtime GC: Cycle collection unimplemented
  • WASI Threads: Specification in flux
  • Native track has immediate high-value opportunities

Consequence: wasm_rt deferred until ecosystem matures. See src/wasm_rt/README.md for revival conditions.


D80: nREPL Memory Model — GPA-only, no ArenaAllocator

Decision: nREPL uses GPA directly for all allocations — both Env (persistent) and evalString (transient). No ArenaAllocator.

Rationale: ArenaAllocator.free() in Zig 0.15.2 performs "last allocation rollback" optimization. When persistent data (Vars) and transient data share the same arena, free/alloc cycles for transient data can overwrite persistent allocations.


D81: Build System Architecture — Pre-compiled Bootstrap Cache

Decision: Generate bootstrap cache at Zig build time, embed as binary data. User-facing paths: cljw file.clj (run) and cljw build file.clj -o app (single binary).

  • registerBuiltins() at startup (Zig function pointers not serializable)
  • restoreFromBootstrapCache (replaces loadBootstrapAll)
  • Full runtime always included in built binaries

Result: ~6x faster startup (~12ms → ~2ms).


D82: Namespace Naming Convention — clojure.* + cljw.* Split

Decision: Two-prefix convention (Babashka model):

  1. clojure.* — JVM Clojure-compatible namespaces
  2. cljw.* — ClojureWasm-unique extensions (cljw.wasm, cljw.http, cljw.build)
  3. user — Default namespace

clojure.java.* names kept for compatibility (matches Babashka's approach).


D83: HTTP Server Architecture — Blocking + Background Mode

Decision: cljw.http namespace with Ring-compatible handler model.

  1. Blocking mode (default): run-server runs accept loop in calling thread
  2. Background mode (with --nrepl): spawns background thread, returns immediately
  3. Build mode: returns nil during cljw build to prevent blocking
  4. Threading: Thread per connection with mutex on handler call

D84: Custom Wasm Runtime — Replace zware Dependency

Decision: Custom Wasm runtime in src/wasm/runtime/ replacing zware.

  1. Switch-based dispatch — works on all Zig backends (cross-compilation)
  2. Direct bytecode execution — no intermediate representation
  3. Wasm MVP + WASI Preview 1 — ~200 opcodes + SIMD (236 opcodes), 19 WASI functions

Scope: ~5300 LOC, 8 files. Zero external dependencies.


D85: NaN Boxing 4-Heap-Tag — 48-Bit Address Support

Decision: 4-heap-tag NaN boxing scheme for Value representation (8 bytes).

Encoding (top 16 bits of u64):

  • < 0xFFF9: float (raw f64 bits)
  • 0xFFF9: integer (48-bit signed)
  • 0xFFFB: constant (nil, true, false)
  • 0xFFFC: char (u21 codepoint)
  • 0xFFFD: builtin function pointer
  • 0xFFF8/0xFFFA/0xFFFE/0xFFFF: heap pointers (3-bit sub-type + 45-bit shifted address)

28 heap types across 4 tags. 8-byte alignment shift (addr >> 3) gives 48-bit effective address range. Negative NaN canonicalized to positive NaN.

Supersedes: D72 (original NaN boxing with 40-bit address, deferred).


D86: Wasm Interpreter Optimization Strategy (Non-JIT)

Decision: Three targeted optimizations for switch-based Wasm interpreter:

  1. VM reuse (36.7A): Cache Vm in WasmModule, reset() per invoke
  2. Branch target precomputation (36.7B): Lazy sidetable in WasmFunction.branch_table
  3. Memory/local optimization (36.7C): Abandoned — ROI too low

Results (hyperfine, ReleaseSafe):

Benchmark Before After Speedup
wasm_call 931ms 118ms 7.9x
wasm_fib 11046ms 7663ms 1.44x
wasm_memory 192ms 26ms 7.4x
wasm_sieve 822ms 792ms 1.04x

Resolved: Register IR implemented in zwasm. LEB128 predecode and bytecode fusion done (Phase 37/45).


D87: ARM64 JIT PoC — Hot Loop Native Code Generation

Decision: Compile hot integer arithmetic loops to native ARM64 machine code at runtime. Interpreter-integrated, single-loop cache, automatic deopt.

Architecture:

  • Detection: Back-edge counter in vmRecurLoop. Threshold = 64 iterations.
  • Compilation: jit.ziganalyzeLoop extracts loop ops, compileLoop emits ARM64. Supported ops: branch_ne/ge/gt (locals/const), add/sub (locals/const), recur_loop.
  • NaN-box integration: SBFX unbox at entry, AND+ORR re-box at exit. used_slots bitset: only loads/checks slots referenced by loop body (skips closure self-ref).
  • THEN path skip: analyzeLoop uses exit_offset from data word to jump past exit code, only analyzing the ELSE path (loop body).
  • Execution: W^X transition (mmap WRITE → mprotect READ|EXEC), sys_icache_invalidate.
  • JitState per VM: Single cached loop. maxInt(u32) sentinel prevents retry after deopt.
  • Platform: ARM64 only (comptime check on builtin.cpu.arch == .aarch64). No-op on other architectures.

Results (hyperfine, ReleaseSafe, Apple M4 Pro):

Benchmark Before (37.3) After (37.4) Speedup
arith_loop 31ms 3ms 10.3x
fib_recursive 16ms 16ms 1.0x
(cumulative) 53ms (base) 3ms 17.7x

Scope limitation: PoC targets simple integer loops only. Not compiled: function calls, heap allocation, string ops, collection ops. fib_recursive uses recursion (not loop), so JIT does not apply.

D88: Cross-Boundary Exception Handling — call_target_frame Scope Isolation

Decision: Add call_target_frame field to VM to prevent exception handlers from dispatching across VM/TreeWalk bridge boundaries.

Problem: When execution crosses VM→TW→VM boundaries (e.g. run-testsdo-testing → TW closure → derive throws), throw_ex dispatches to the nearest handler regardless of call boundary. This causes an outer scope's try/finally handler (from binding in do-testing) to intercept exceptions meant for inner scope's try/catch (from TW's thrown?).

Architecture:

  • call_target_frame: usize on VM — set by callFunction to current frame_count
  • throw_ex: only dispatch to handler if handler.saved_frame_count > call_target_frame
  • executeUntil: same scope check before error handler dispatch
  • callFunction: errdefer restores sp, frame_count, and current_ns on error propagation, preventing stale frames from corrupting subsequent calls

Companion fix: Deferred var_ref resolution in bootstrap cache. var_ref constants (e.g. (var *testing-contexts*)) are serialized with ns/var names but cannot be resolved during readFnProtoTable (vars don't exist yet). Deferred fixup list resolves them after restoreEnvState.

Files: src/native/vm/jit.zig (new, ~700 lines), src/native/vm/vm.zig (JitState integration).

D89: Four New Value Types — Array, BigInt, Ratio, BigDecimal (Phase 43)

Decision: Reserve NanHeapTag slots 29 (big_int), 30 (ratio+big_decimal), 31 (array) in Group D for four Value types needed by Phase 43 (Numeric Types + Arrays).

Types:

  • ZigArray: Mutable typed container (items: []Value, element_type: ElementType). ElementType enum: object, int, long, float, double, boolean, byte, short, char. Equivalent to JVM's Object[] / int[] etc. Identity equality (mutable).
  • BigInt: Arbitrary precision integer backed by std.math.big.int.Managed. Structural equality via Const.eql(). Printed as <digits>N.
  • Ratio: Exact rational as numerator/denominator BigInt pair. Structural equality. Printed as <num>/<den>.
  • BigDecimal: Scaled BigInt (unscaled × 10^(-scale)). Shares NanHeapTag slot 30 with Ratio via NumericExtKind discriminator enum(u8) as first field of both extern structs. Printed as <digits>M.

GC: Array traces all items. BigInt marks struct only (limbs managed by allocator). Ratio marks struct + numerator/denominator BigInt pointers. BigDecimal marks struct + unscaled BigInt pointer.

Files: src/common/value.zig, src/common/collections.zig, src/common/gc.zig, src/common/builtin/array.zig (new).

D90: Wasm Interpreter Optimization Strategy (Phase 44.5 Research)

Decision: Defer full Wasm interpreter optimization to post-alpha. The recommended approach for future work is predecoded IR + tail-call threaded dispatch.

Research findings (Phase 44.5):

  • Current: switch-based dispatch, inline LEB128 decode, lazy HashMap branch table
  • Baseline: wasm_fib 7539ms, wasm_sieve 782ms, wasm_call 121ms
  • Zig 0.15.2 supports @call(.always_tail, handler, ...) — verified working
  • Recommended approach: predecode bytecode → fixed-width IR (8 bytes/instr), then threaded dispatch via function pointer table + tail calls
  • Expected impact: 40-60% improvement (2-3x for fib)
  • Effort: HIGH (3177-line vm.zig, 200+ opcodes, control flow complexity)

Why defer: Alpha release priorities are correctness and documentation. The Clojure execution speed is already competitive (19/20 wins vs Babashka). Wasm speed is aspirational — users care about Clojure code speed first.

Post-alpha plan: Predecoded IR (eliminates LEB128 + bounds checks) → tail-call dispatch (eliminates branch misprediction) → superinstructions (fuse common patterns).

D91: Directory Restructure — Pipeline-Based Layout

Decision: Restructure src/ from legacy common/native/ two-tier layout to pipeline-oriented structure where each compilation stage is a top-level directory.

Before: src/common/ (Reader, Analyzer, Compiler, Builtins, Value all mixed), src/native/ (just VM + TreeWalk). Pipeline structure invisible from outside.

After:

src/
  reader/      → Stage 1: Source → Form
  analyzer/    → Stage 2: Form → Node
  compiler/    → Stage 3: Node → Bytecode (was bytecode/)
  vm/          → Stage 4a: Bytecode → Value
  evaluator/   → Stage 4b: Node → Value (TreeWalk)
  runtime/     → Core types + lifecycle (was common/ loose files)
  builtins/    → Built-in functions (was common/builtin/)
  regex/       → Regex engine
  repl/        → nREPL + REPL (unchanged)
  wasm/        → WebAssembly runtime (flattened from wasm/runtime/)

Merges: strings+clj_string → strings, io+file_io+java_io → io, arithmetic+numeric → arithmetic. 70 → 66 files.

Rationale: OSS release visibility. New contributors can see the compilation pipeline from the directory listing. The common/native split was a wasm_rt-era artifact with no current meaning.

D92: zwasm Integration — External Wasm Runtime Dependency

Decision: Replace CW's internal wasm engine (9 files, ~9300 LOC) with zwasm as a GitHub URL dependency (v0.1.0, https://github.com/clojurewasm/zwasm). CW keeps a thin bridge file (src/wasm/types.zig) that wraps zwasm's public API into CW's Value system.

Before: CW had a frozen copy of the wasm runtime (vm, store, module, instance, opcode, predecode, memory, leb128, wasi) in src/wasm/. This was the Phase 35W engine, missing Register IR, ARM64 JIT, and post-Phase 45 optimizations.

After:

src/wasm/
  types.zig      → Bridge: delegates to zwasm.WasmModule, keeps Value↔u64 marshalling
  builtins.zig   → Unchanged (imports from types.zig)
  wit_parser.zig → Unchanged (CW-specific WIT handling)

Bridge design: WasmModule.inner: *zwasm.WasmModule delegation pattern. Host function trampoline uses zwasm.Vm for stack access, zwasm.inspectImportFunctions for import type resolution. The bridge handles Value↔u64 conversion, HostContext, and Clojure imports map → []zwasm.ImportEntry translation.

Build: build.zig.zon GitHub URL dependency (v0.1.0 tag tarball). zig build auto-fetches zwasm. Native targets only (wasm32-wasi does not link zwasm).

Benefits:

  • -9300 LOC in CW (maintenance burden eliminated)
  • CW automatically inherits zwasm improvements (Register IR, JIT, spec compliance)
  • zwasm remains fully independent (no CW-specific code)

zwasm API additions (generic, not CW-specific):

  • pub const Vm — re-export for embedder host function access
  • inspectImportFunctions() — pre-analysis utility for import type metadata

D93: case* Special Form — Hash-Based Constant Dispatch

Decision: Implement case* as a proper special form across the full pipeline (Analyzer → Node → Compiler + TreeWalk), replacing the previous cond-based case macro with the upstream case*/hash-dispatch design.

Node type: CaseNode (expr, shift, mask, default, clauses, test_type, skip_check). Three test types: :int (integer identity), :hash-equiv (hash + equality), :hash-identity (hash + identity for interned types like keywords).

Compiler: Equality-check chain — for each clause: dup expr, load constant, eq, conditional jump. O(n) but correct. Future: switch to table jump for :compact.

TreeWalk: Hash-based dispatch — compute shift-masked hash, scan clauses for match, optional skip-check for hash collision buckets.

case macro: Ported from upstream. Uses prep-ints/prep-hashes to compute optimal shift/mask parameters. Helper functions: shift-mask, maybe-min-hash, case-map, fits-table?, prep-ints, merge-hash-collisions, prep-hashes.

Also fixed: Vector destructuring (makeNthCall) now uses 3-arity nth with nil default, matching Clojure's behavior of returning nil for missing positions instead of throwing.

D94: GC Thread Safety — Mutex + Stop-the-World Architecture

Decision: Make MarkSweepGc thread-safe via a single gc_mutex that serializes all allocation (msAlloc/msFree/msResize/msRemap) and collection (collectIfNeeded, gcCollect = traceRoots + sweep) paths.

Design: Global GC lock approach — simplest correct implementation. The mutex is held for the entire mark+sweep cycle, preventing allocation during collection (stop-the-world). Multiple threads serialize on the mutex for allocation.

Thread registry: ThreadRegistry tracks active mutator thread count via atomic counter. Infrastructure for future safe-point integration — when a thread triggers collection, it will signal others to pause at safe points, wait for all to reach safe points, then collect with combined root sets.

Scope: Phase 48.2 adds the mutex + registry. Thread spawning (48.3) will integrate safe-point coordination. Future optimization: concurrent marking, thread-local allocation buffers (TLABs), generational collection.

D95: Protocol/ProtocolFn Serialization — Eliminating Startup Re-evaluation

Problem: After D81 (bootstrap cache), Protocol and ProtocolFn values were not serializable. restoreFromBootstrapCache called reloadProtocolNamespaces to re-evaluate protocols.clj + reducers.clj (~440 lines) via TreeWalk at every startup, causing 23.3ms startup time and 226MB memory usage.

Decision: Serialize Protocol and ProtocolFn values in the bootstrap cache. Protocol stores name + method_sigs + impls (nested map of type_key → method_map). ProtocolFn stores method_name + protocol var reference (ns + name), resolved via deferred fixup after env restore. Fn closure_bindings also serialized.

Cache invalidation: Protocol gains a generation counter, incremented on every extend_type_method / extend-type call. ProtocolFn inline cache checks cached_generation == protocol.generation to detect stale entries. This fixes a latent bug where VM-compiled reify forms share compile-time type keys, causing the monomorphic cache to return stale methods when the same type key gets new impls. Also fixed extend_type_method to replace existing methods (same name) in the method map instead of always appending.

Result: Startup 23.3ms → 5.3ms (4.4x), memory 226MB → 8.1MB (28x reduction). All upstream tests pass, no regression.

D96: Lazy-seq Realization Depth — VM Frames + Iterative Unwrapping

Problem: User-defined lazy-seq thunks (non-Meta path) consumed VM call frames proportional to nesting depth. VM FRAMES_MAX was 256, limiting thunk-based lazy-seq chains to ~200 levels.

Decision:

  1. Increase VM FRAMES_MAX from 256 to 1024 (4x headroom)
  2. Add iterative unwrapping in LazySeq.realize: when a thunk returns another lazy-seq, loop instead of recursing (matches JVM LazySeq.seq() pattern)
  3. Keep TreeWalk MAX_CALL_DEPTH at 512 (Zig stack limited in Debug builds)

Depth limits after D96:

  • Built-in map/filter/take (Meta path): effectively unlimited (Zig stack only)
  • User-defined lazy-seq via VM: ~1000 levels
  • User-defined lazy-seq via TreeWalk: ~500 levels
  • D74 filter chain collapsing: unlimited nested filters (sieve)

Trade-off: VM struct grows ~78KB (frames array). No measurable impact on binary size, startup, RSS, or benchmarks.


D97: Syntax-Quote Namespace Qualification at Read Time

Problem: JVM Clojure's syntax-quote resolves unqualified symbols to fully qualified names at read time (e.g., \foobecomesmy.ns/foo`). CW's reader did not do this, causing macro expansions to produce unqualified symbols that failed when evaluated in different namespaces.

Decision: Reader gets current_ns field. expandSyntaxQuote qualifies unqualified symbols using current_ns (except special forms and auto-gensyms).

Scope: reader.zig (new field + qualifier logic), eval.zig (passes current ns to reader), bootstrap.zig (readFormsWithNs helper).


D98: spec.alpha Lazy Bootstrap Loading

Problem: spec.alpha (~500 LOC) added to eager bootstrap increased startup from 4.2ms to 5.9ms, exceeding the 5ms threshold.

Decision: spec.alpha and spec.gen.alpha are embedded in the binary but NOT loaded at startup. Instead, first (require '[clojure.spec.alpha :as s]) triggers loading via loadEmbeddedLib fallback in ns_ops.loadLib.

Trade-off: First require of spec.alpha has a ~1-2ms cost (one-time). Startup stays at baseline (4.1ms). Binary still embeds the source (~unchanged size since cache excludes spec.alpha serialization).

D99: Seq-Based Sequential Destructuring with &

Problem: Sequential destructuring [a b & r] used nth for positional access. This fails on maps (which are seqable but don't support nth), breaking s/keys which uses [[k v] & ks :as keys] patterns on maps.

Decision: When & is present in a sequential destructuring pattern, use seq/first/next chain instead of nth. Each (next seq_ref) gets a fresh local variable slot to avoid stale references. Without &, the original nth-based path is preserved for efficiency.

Matches JVM: Clojure's destructure uses seq/first/next when & is present. The CW analyzer now does the same.

D100: GC Suppression During Macro Expansion

Problem: Macro expansion via syntax-quote generates lazy sequences (concat/list*). During VM execution of the macro function, GC at safe points can sweep Values captured in lazy-seq thunk closures. After macro return, valueToForm encounters dangling pointers in the result tree.

Root causes identified (6 fixes):

  1. valueToForm didn't copy GC-allocated string data to node_arena
  2. ProtocolFn/MultiFn inline caches not traced by GC
  3. refer() stored GC-allocated symbol name pointers
  4. Protocol dispatch created 3 temporary HeapStrings per cache miss
  5. During macro callFnVal, lazy-seq closure-captured Values swept by GC
  6. During valueToForm, lazy-seq realization triggers GC while result tree unrooted

Decision: Suppress GC collection during the entire expandMacro scope (callFnVal + valueToForm). Added suppress_count to MarkSweepGc. Collection is deferred, not skipped — next safe point after unsuppress will collect normally. Macro expansion allocations are bounded (result tree size), so memory pressure is acceptable.

Additionally: getByStringKey on PersistentArrayMap eliminates all temporary HeapString allocations in protocol dispatch hot path.

D101: Java Interop Architecture

Problem: Java interop code was scattered across analyzer.zig (rewrite tables), strings.zig (javaMethodFn), predicates.zig, arithmetic.zig, system.zig, and registry.zig. Adding a new Java class required changes in 3+ files. URI, File, and UUID classes were needed for library compatibility (hiccup, web apps, scripts).

Decision: Extract interop into a dedicated src/interop/ module:

src/interop/
  rewrites.zig       -- Static field + method rewrite tables
  dispatch.zig       -- Instance method dispatch (__java-method)
  constructors.zig   -- Constructor dispatch (__interop-new)
  classes/
    uri.zig          -- java.net.URI
    file.zig         -- java.io.File
    uuid.zig         -- java.util.UUID

Object model: Java class instances = PersistentArrayMap with :__reify_type keyword key (e.g., {:__reify_type "java.util.UUID" :uuid "550e8400-..."}). No new Value tags needed. Works with GC (maps are traced). type builtin reads :__reify_type. str delegates to .toString() for class instances. prn prints tagged literals (e.g., #uuid "..." for UUID).

Constructor syntax: (ClassName. args...) and (new ClassName args...) are analyzer rewrites to (__interop-new "fqcn" args...). :import stores FQCN: (:import (java.net URI))(def URI 'java.net.URI).

Adding new classes: 1 new file in classes/ + 1 registration in constructors.zig + dispatch.zig + rewrites.zig. Down from 3+ files.

D102: Bootstrap Cache FnProto Forward-Reference Fix

Problem: readFnProtoTable in serialize.zig assigned an uninitialized pointer array to self.fn_protos before populating it. When a FnProto's constant pool contained a fn_val referencing a higher-indexed proto (forward reference), the fn_val got an uninitialized proto pointer (0xaaaaaaaaaaaaaaaa). This caused GC crashes under heavy allocation pressure when tracing fn_val chains. Triggered by cl-format adding ~50 closures to the pprint namespace.

Fix: Two-pass deserialization. Pass 1: pre-allocate all FnProto structs and populate the pointer array. Pass 2: deserialize content into pre-allocated structs. All proto pointers are now valid before any constant pool deserialization begins.


D103: -Dwasm=false Build Option

Decision: Add -Dwasm=false compile-time feature flag to exclude zwasm dependency entirely, producing a smaller binary for users who don't need Wasm FFI.

Implementation: Zig comptime branching in 8 files (types.zig, builtins.zig, wit_parser.zig, registry.zig, main.zig, deps.zig, root.zig, nrepl.zig). When enable_wasm=false, zwasm is never @import'd (lazy analysis ensures no linker dependency). Value enum tags (wasm_module=26, wasm_fn=27) are retained for serialization compatibility. DCE handles unreachable dispatch paths in vm.zig, tree_walk.zig, gc.zig automatically.

Result: Default 4.25MB → wasm=false 3.68MB (-570KB, -13%).


D104: Lazy Bootstrap — Deferred NS Deserialization

Decision: Defer deserialization of non-essential namespaces from startup to require time. Only clojure.core, clojure.core.protocols, and user are restored eagerly. The remaining 12 eager namespaces (walk, template, test, set, data, repl, java.shell, java.io, pprint, stacktrace, zip, core.reducers) are recorded as deferred entries with byte offsets into the bootstrap cache.

Cache format: Unchanged binary format. The Deserializer scans each NS at startup but only fully restores essential ones — non-essential ones are skipped via skipNamespaceData() (which parses binary structure to advance the offset) and their start positions recorded in a module-level deferred_ns_entries map.

Key design choices:

  • Module-level globals for deferred state (Deserializer is stack-local, goes out of scope)
  • @embedFile data has static lifetime — no copy needed for deferred reads
  • FnProto table fully deserialized at startup (shared across all NS); unresolvable var_refs moved to global_deferred_var_refs list via resolveOrDeferVarRefs()
  • Recursive dependency resolution: restoreDeferredNs calls restoreFromDeferredCache when a refer/alias target NS is itself deferred. Entry removed before call to prevent cycles.
  • resolveGlobalDeferredRefs() called after each deferred NS restoration to resolve newly-available var_refs and protocol fns.

Result: Startup 5.2ms → 4.6ms (-12%). RSS 9.3MB → 7.4MB (-20%).

Note: D104 will be removed in Phase 83E when all core NS are Zig builtins and bytecode deserialization is eliminated entirely.


D105: Architecture v2 — InterOp Unification & All-Zig Core

Decision: Major architectural evolution across 5 phases (83A-83E). Full design: .dev/interop-v2-design.md.

Motivation: D101 Java InterOp works but has structural issues: fragmented registration (5+ files per class), silent nil on unknown methods, Exception. returning raw string, byte-level string ops, handle safety gaps. Additionally, .clj bootstrap adds startup cost and CLJW marker maintenance.

Changes:

  1. Exception Unification (83A): (Exception. "msg") → map with :__ex_info. Comptime exception hierarchy table. isSubclassOf for catch dispatch. Unknown methods → error.

  2. ClassDef Registry (83B): Single ClassDef struct per Java class. One registry, consulted by analyzer + dispatcher + instance? + constructors. Protocol-based method dispatch. Method Missing → error. Supersedes D101's 5-file registration pattern.

  3. UTF-8 Codepoint (83C): String index operations use codepoint semantics. Internal UTF-8 representation unchanged. std.unicode.Utf8Iterator for indexing.

  4. Handle Safety (83D): Closed flag, use-after-close detection, GC finalization.

  5. All-Zig Core (83E): All standard-library functions → Zig builtins. .clj loading reserved for user code and libraries only. Eliminates bytecode deserialization (D104 becomes unnecessary), CLJW markers, VM interpretation overhead for core functions.

Object model: Unchanged — class instances remain PersistentArrayMap with :__reify_type. The change is in how classes are defined and dispatched.

Migration invariant: All tests pass after every sub-task. Incremental, defensive migration with benchmarks recorded at milestones.

D106: extend-via-metadata Protocol Dispatch

Decision: Support :extend-via-metadata true on defprotocol. When set, protocol dispatch checks (meta obj) for FQ symbol key (e.g. ns/method-name) BEFORE the inline cache and impls map lookup.

Motivation: JVM Clojure feature used by Datafiable and Navigable in clojure.core.protocols. Without it, these protocols can't be extended via metadata, breaking upstream compatibility.

Implementation:

  • DefProtocolNode and Protocol gain extend_via_metadata: bool and defining_ns: ?[]const u8
  • Analyzer parses :extend-via-metadata true keyword option in defprotocol
  • Compiler encodes flag as first element of sigs vector
  • Dispatch order (both backends): metadata → inline cache → impls → "Object" fallback
  • Metadata dispatch is per-object (not cached) — two maps with same type can have different metadata, so the type-based inline cache must be bypassed
  • protocols.clj Datafiable/Navigable re-enabled with :extend-via-metadata true

D107: Unified Namespace Registration Architecture (Phase R)

Decision: Self-describing NamespaceDef struct + generic registration/loading.

Motivation: 10 concrete problems — 480 lines of copy-pasted registration boilerplate, 10 hand-written loadXxx() functions, loadEmbeddedLib() string-comparison if-chain, mixed naming conventions, split namespace responsibility.

Implementation:

  • NamespaceDef struct in registry.zig: name, builtins, macro_builtins, dynamic_vars, constant_vars, loading (pure_zig/eager_eval/lazy), embedded_source, extra_refers, extra_aliases, post_register, enabled
  • registerNamespace(): generic function replacing 20+ copy-pasted blocks
  • lib/defs.zig: aggregates all library NamespaceDef entries
  • inline for (all_namespace_defs) loop in registerBuiltins()
  • ns_loader.zig: generic loadNamespaceClj() + loadLazyNamespace() replaces hand-written loadXxx and loadEmbeddedLib if-chain
  • 30 lib/*.zig files (one per non-core namespace)
  • ~470 lines of boilerplate removed from registry.zig + bootstrap.zig

Status: Phases R1-R3 + R6 complete. R4 (core/ file moves) and R5 (requireLib extraction) deferred — purely organizational, no functional impact.

D108: Pure Zig Runtime — Zero Embedded Clojure, 1NS=1File

Decision: CW is a pure Zig Clojure runtime. No self-hosting philosophy. All namespace implementations converge to self-contained Zig modules.

Principles:

  1. Zero embedded Clojure: Eliminate all evalString bootstrap. No .clj in the processing pipeline. LoadStrategy eager_eval → extinct. All vars are Zig builtins or Zig-registered macros.
  2. 1 NS = 1 File: Each lib/.zig contains both NamespaceDef AND full implementation. No separate ns_.zig files. Single touch-point per NS.
  3. Upstream mapping: Clear Clojure NS/var → Zig file:function mapping. Enables fast upstream change tracking.
  4. Behavioral compat, not structural compat: Upstream .clj implementations are reference for behavior, not structure. Zig implementations may differ internally for performance, binary size, or startup optimization.
  5. No self-hosting: Unlike JVM Clojure, CW does not define Clojure in Clojure. .clj files are for user code only.

Motivation: Embedded Clojure strings are fragile (no type checking, no tooling, startup cost from evalString). The ns_.zig + lib/.zig split creates redundant touch-points. Pure Zig enables comptime optimization, dead code elimination, and minimal binary size.

Migration path (incremental, regression-safe):

  1. Merge ns_.zig → lib/.zig (1NS=1File consolidation)
  2. Phase B.15-B.16: Convert remaining embedded Clojure to Zig
  3. Phase C: Eliminate bootstrap pipeline
  4. Phase E: Optimize (binary size, startup, benchmarks)

Status: In progress. 18 ns_.zig files to merge into lib/.zig.

D109: Zone-Layered Architecture (Phase 97)

Decision: Strict 4-zone layered architecture with enforced dependency direction.

Zones:

Layer 0: src/runtime/   — foundational types (Value, collections, Env, GC, Namespace, Var)
                          NO imports from engine/, lang/, or app/
Layer 1: src/engine/    — processing pipeline (Reader, Analyzer, Compiler, VM, TreeWalk)
                          imports runtime/ only
Layer 2: src/lang/      — Clojure language (builtins, interop, lib namespaces)
                          imports runtime/ + engine/
Layer 3: src/app/       — application (main, CLI, REPL, deps, Wasm)
                          imports anything

Core technique: callFnVal vtable (function pointer table in runtime/dispatch.zig). Eliminates runtime/ → engine/ dependency (bootstrap.zig imported TreeWalk + VM). Engine/ sets function pointers at startup. Runtime/ calls through vtable.

Motivation: CW's runtime/ had 112 upward imports (95 to builtins/, 2 to evaluator/, 5 to compiler/, 2 to vm/, 4 to analyzer/, 4 to reader/) — all caused by bootstrap.zig (3,624 LOC God object) and eval_engine.zig (2,556 LOC). NextClojureWasm's strict 3-zone model demonstrated that layered architecture prevents circular dependencies and makes refactoring tractable.

Plan: .dev/refactoring-plan.md (sub-tasks R0-R12). Rules: .claude/rules/zone-deps.md (auto-loads on src/ edits).

Result targets: 0 upward imports, bootstrap.zig < 200 LOC, main.zig < 200 LOC.

D110: zwasm Allocator Injection — Eliminate Dual-GC

Date: 2026-03-08 Status: Future (depends on zwasm D128) Decision: When zwasm implements allocator injection (D128), CW will pass its own GC-managed allocator to zwasm instead of letting zwasm use an internal Arena. This eliminates the dual-GC lifecycle mismatch.

Current problem: CW GC (Mark-Sweep) manages wasm Value objects (wasm_module, wasm_fn, wasm_instance). When CW GC sweeps these, it frees the CW-side wrapper, but zwasm's internal Arena retains the underlying memory. The Arena only frees on full deinit (process exit), so long-running CW processes that load/unload Wasm modules will leak zwasm-side memory.

Target state: CW passes its allocator to zwasm.Engine.init(cw_allocator). zwasm allocations become CW GC-visible. When CW GC sweeps a wasm Value, the underlying zwasm memory is also reclaimable.

Scope: Only zwasm's internal bookkeeping (module metadata, function tables, instance state). Wasm linear memory remains separately managed per spec.

Migration: Minimal CW changes — update wasm_types.zig to pass allocator at Engine construction. Requires zwasm D128 to be implemented first.

Related: zwasm D128, cw-new D13.

D111: Zig 0.15.2 → 0.16.0 Migration

Date: 2026-04-27 Status: Done Decision: Migrate the entire ClojureWasm tree from Zig 0.15.2 to 0.16.0, together with bumping zwasm to v1.11.0 (the first 0.16-compatible tag). Centralize the new std.Io model behind a process-wide accessor module runtime/io_default.zig so existing module-level mutexes, time helpers, env lookups, and sleeps don't have to thread io through every call site.

Why now: Zig 0.16 reshapes std.Io (Mutex/Condition/sleep/Timestamp all take io: Io), removes std.fs.cwd (replaced by std.Io.Dir), removes std.posix.{getenv,write,isatty}, and changes pub fn main() to pub fn main(init: std.process.Init). Staying on 0.15.2 indefinitely forfeits stdlib improvements and forces zwasm to maintain a parallel branch.

Approach:

  • zwasm-first vs detach-then-reattach: chose to upgrade zwasm to v1.11.0 from the start (rejected the original "detach + Phase 6 reattach" plan). Reason: v1.11.0 is already 0.16-ready, so keeping zwasm in saved a whole reattach phase and let wasm e2e/bridge tests stay green throughout.
  • io_default module: production entry points (main, cache_gen) call io_default.set(init.io) at startup, so all module-level mutexes / Condition variables / nanoTimestamp / sleep / getenv pick up the real cancelable io. Tests fall through to a process-wide std.Io.Threaded.init_single_threaded default, except for the few that need real spawn semantics (shell tests) which install a local Threaded.
  • libc linkage: zwasm v1.11.0 enables link_libc = true by default (D135 in zwasm). CW inherits the libc-linked binary; we use std.c.getenv / std.c.realpath / std.c.write / std.c.mprotect / std.c.getcwd in places where stdlib equivalents were removed. Stripping libc back out is a follow-up (F##; cf. zwasm's W46 sequence).
  • temporary stubs: HTTP server, nREPL, fancy line editor, and cljw build rely on std.net / std.posix.poll / raw-mode termios / std.fs.selfExePath — all gone or reshaped in 0.16. The full rewrite to std.Io.net + Smith fuzzing is non-trivial and was scoped out of this migration. Each is stubbed with a clear runtime error and tracked as a separate F## item.

Verification: 1324/1324 unit tests, 83/83 cljw test namespaces, 6/6 wasm e2e, deps.edn e2e all green on macOS aarch64. Bench history records pre-zig-016 and post-zig-016 entries; no individual benchmark regressed beyond noise; lazy_chain actually improved.

Related: zwasm D135 (Vm.io infra), Phase 7 follow-ups in .dev/checklist.md.