Skip to content

docs: ADR 0004 compute-backend session seam + StableHLO/MLIR family direction#447

Open
inureyes wants to merge 2 commits into
mainfrom
docs/adr-0004-compute-backend-seam-direction
Open

docs: ADR 0004 compute-backend session seam + StableHLO/MLIR family direction#447
inureyes wants to merge 2 commits into
mainfrom
docs/adr-0004-compute-backend-seam-direction

Conversation

@inureyes

Copy link
Copy Markdown
Member

Proposed ADR (status: Proposed, awaiting maintainer acceptance) that reframes the compute-backend seam direction, drafted per the discussion following #338 / #446.

Why

The refactor's purpose is to host FuriosaAI (TCP/RNGD), Tenstorrent, and an OpenXLA-based path. The seam shipped in #446 draws the boundary at model load and returns the concrete MLX LoadedModel, which is insufficient: LanguageModel::forward is itself MLX-coupled (it takes &mut [KVCache] and returns UniquePtr<MlxArray>), and all three targets are graph-compiler backends, not eager-op, so neither a load factory nor an op-level trait fits.

What the ADR decides

Draw the seam at the inference-session / engine level with a token-level contract (prefill + decode-step, on-device sampling, backend-owned KV, returns token ids plus optional logprobs), keeping the MLX hot path internal so fusion, mx.compile, paged KV, and prompt-cache detach/adopt are preserved. CxxGenerator becomes the MLX session implementation. Serve the non-MLX targets with one StableHLO/MLIR compiler-family backend (OpenXLA + Tenstorrent TT-MLIR, plus Furiosa if its compiler ingests StableHLO) rather than per-vendor engines, collapsing the execution families to two: MLX eager and StableHLO-compiler. MLX stays the full-featured reference backend.

The select_backend selection skeleton and the default-off experimental-backend feature gate from #446 are kept; the ComputeBackend trait contract from #446 is marked provisional and superseded by the session contract before any non-MLX engine lands.

Open follow-ups named (not resolved here)

Compiler-family model-definition strategy (StableHLO emission), the KV / paged-KV / scheduler coupling to the MLX KVCache type (MLX-only at first, abstracted later), and the Furiosa StableHLO ingestion feasibility unknown.

Validation plan

Prove the session contract with OpenXLA as a second reference backend on one or two hot models before the contract is locked and this ADR moves to Accepted.

This PR is documentation only and should not be merged until the direction is accepted. References #338 and PR #446.

@inureyes inureyes added area:architecture Architecture and code structure changes status:review Under review type:docs Documentation improvements or additions labels Jun 26, 2026
…allel tracks, export-first model definition, quantized success bar)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:architecture Architecture and code structure changes status:review Under review type:docs Documentation improvements or additions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant