docs: ADR 0004 compute-backend session seam + StableHLO/MLIR family direction by inureyes · Pull Request #447 · lablup/mlxcel

inureyes · 2026-06-26T02:07:25Z

Proposed ADR (status: Proposed, awaiting maintainer acceptance) that reframes the compute-backend seam direction, drafted per the discussion following #338 / #446.

Why

The refactor's purpose is to host FuriosaAI (TCP/RNGD), Tenstorrent, and an OpenXLA-based path. The seam shipped in #446 draws the boundary at model load and returns the concrete MLX LoadedModel, which is insufficient: LanguageModel::forward is itself MLX-coupled (it takes &mut [KVCache] and returns UniquePtr<MlxArray>), and all three targets are graph-compiler backends, not eager-op, so neither a load factory nor an op-level trait fits.

What the ADR decides

Draw the seam at the inference-session / engine level with a token-level contract (prefill + decode-step, on-device sampling, backend-owned KV, returns token ids plus optional logprobs), keeping the MLX hot path internal so fusion, mx.compile, paged KV, and prompt-cache detach/adopt are preserved. CxxGenerator becomes the MLX session implementation. Serve the non-MLX targets with one StableHLO/MLIR compiler-family backend (OpenXLA + Tenstorrent TT-MLIR, plus Furiosa if its compiler ingests StableHLO) rather than per-vendor engines, collapsing the execution families to two: MLX eager and StableHLO-compiler. MLX stays the full-featured reference backend.

The select_backend selection skeleton and the default-off experimental-backend feature gate from #446 are kept; the ComputeBackend trait contract from #446 is marked provisional and superseded by the session contract before any non-MLX engine lands.

Open follow-ups named (not resolved here)

Compiler-family model-definition strategy (StableHLO emission), the KV / paged-KV / scheduler coupling to the MLX KVCache type (MLX-only at first, abstracted later), and the Furiosa StableHLO ingestion feasibility unknown.

Validation plan

Prove the session contract with OpenXLA as a second reference backend on one or two hot models before the contract is locked and this ADR moves to Accepted.

This PR is documentation only and should not be merged until the direction is accepted. References #338 and PR #446.

…MLIR family direction

…allel tracks, export-first model definition, quantized success bar)

docs: add ADR 0004 on the compute-backend session seam and StableHLO/…

3652a58

…MLIR family direction

inureyes added area:architecture Architecture and code structure changes status:review Under review type:docs Documentation improvements or additions labels Jun 26, 2026

docs: record the 2026-06-26 ComputeBackend decisions in ADR 0004 (par…

485a23c

…allel tracks, export-first model definition, quantized success bar)

This was referenced Jun 26, 2026

refactor: redraw ComputeBackend as an inference-session engine contract and move the MLX path behind it (byte-identical) #448

Open

feat: OpenXLA reference backend - export-route spike through 4-bit quantized decode #449

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: ADR 0004 compute-backend session seam + StableHLO/MLIR family direction#447

docs: ADR 0004 compute-backend session seam + StableHLO/MLIR family direction#447
inureyes wants to merge 2 commits into
mainfrom
docs/adr-0004-compute-backend-seam-direction

inureyes commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

inureyes commented Jun 26, 2026

Why

What the ADR decides

Open follow-ups named (not resolved here)

Validation plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant