feat: introduce a ComputeBackend seam to abstract the forward-execution engine by inureyes · Pull Request #446 · lablup/mlxcel

inureyes · 2026-06-25T23:05:59Z

Summary

Introduce a ComputeBackend seam so a future non-MLX execution engine can host forward() without routing through the MLX bridge. The motivating target is FuriosaAI TCP / RNGD, whose furiosa-opt toolchain compiles to a virtual ISA and cannot use MLX at all. This PR lands the seam only and is backend-neutral; it implements no Furiosa or TCP kernels.

Seam design

ComputeBackend (new src/backend/mod.rs) abstracts who executes forward(), not individual ops. It sits at the model-load boundary (called once per load), not at the per-token forward() call, so MLX graph fusion and mx.compile stay intact and no indirection enters the inner loop. The trait is drawn narrowly to the load entry points (load_model, load_model_with_adapter, load_model_with_tensor_parallel) so the MLX path keeps its concrete hot types exposed: paged KV, prompt-cache detach and adopt, and cache tensors are never type-erased behind Box<dyn>. A non-MLX engine implements the same LanguageModel forward contract behind a different backend.

What moved behind MlxBackend

MlxBackend (src/backend/mlx.rs) is a zero-sized type that implements ComputeBackend by delegating unchanged to the existing crate::loading entry points. No loading logic is reimplemented; the same code runs whether reached directly or through the seam. The public load_model / load_model_with_adapter / load_model_with_tensor_parallel functions remain available (tests and bench harnesses still use them).

Feature flag

A default-off experimental-backend Cargo feature gates the optional non-MLX path. The src/backend/experimental.rs scaffold module and the Backend::Experimental enum variant are cfg-gated behind it, so shipping binaries (Apple Silicon, CUDA) compile no extra backend code. The scaffold reports "not implemented" rather than pretending to load; a real engine (and any hardware-feasibility gate) is future work. The feature-on select_backend() is the only place that reads a runtime backend switch (MLXCEL_BACKEND), and it is compiled only when the feature is enabled.

How codegen equivalence / no dispatch is guaranteed when the feature is off

Backend is an enum whose only variant under default features is Backend::Mlx, and MlxBackend is zero-sized, so Backend is itself zero-sized with no discriminant. select_backend() (#[inline], #[must_use]) is a constant constructor that always returns that one variant with no environment read and no branch. Every Backend method is a single-arm match marked #[inline]. After inlining (release builds use fat LTO and codegen-units = 1), select_backend().load_model(p) lowers to a direct call to the existing MLX loader, identical to the pre-seam build. The env-reading selection path and the second enum variant only exist under #[cfg(feature = "experimental-backend")], so they are absent from default codegen entirely.

How behavior preservation is ensured

Behavior is preserved by construction: the control-plane load call sites now go through the seam, and the seam methods delegate to the unchanged loaders, so the same loading and forward code runs. Rerouted call sites: src/commands/generate.rs (primary load: tensor-parallel / adapter / plain, plus the offline draft-model load), src/commands/chat.rs (REPL load), and src/server/model_worker.rs (both the batched and the legacy sequential worker loops, each covering tensor-parallel / adapter / plain). The pipeline-parallel branch keeps its own distributed loader and does not go through the seam.

What changed

src/backend/mod.rs (new): ComputeBackend trait, Backend enum, select_backend().
src/backend/mlx.rs (new): MlxBackend, delegates to crate::loading.
src/backend/experimental.rs (new, cfg-gated): scaffold plug-in slot for a non-MLX engine.
src/backend/tests.rs (new): scoped seam tests.
src/lib.rs: pub mod backend; and re-export of Backend, ComputeBackend, MlxBackend, select_backend.
Cargo.toml: default-off experimental-backend feature.
src/commands/generate.rs, src/commands/chat.rs, src/server/model_worker.rs: route loads through select_backend().

Test plan

cargo check --lib --features metal,accelerate
cargo check --bins --features metal,accelerate
cargo check --lib --features metal,accelerate,experimental-backend (gated module compiles)
cargo clippy --lib --tests --features metal,accelerate -- -D warnings
cargo clippy --lib --features metal,accelerate,experimental-backend -- -D warnings
cargo test --lib backend:: --features metal,accelerate (3 tests pass)
cargo fmt --check
Real-checkpoint temp-0 byte-identical token parity and throughput on the MLX path are owned by the maintainer's release-build parity gate.

Closes #338

…on engine Add a backend boundary so a future non-MLX execution engine can host forward() without routing through the MLX bridge. The motivating target is FuriosaAI TCP / RNGD, whose furiosa-opt toolchain compiles to a virtual ISA and cannot use MLX at all. This change lands the seam only and is backend-neutral; it implements no Furiosa or TCP kernels. The new src/backend module defines a ComputeBackend trait that abstracts who executes forward(), not individual ops. The seam sits at the model-load boundary (once per load), not at the per-token forward() call, so MLX graph fusion and mx.compile are untouched and no indirection enters the inner loop. The trait is drawn narrowly to the load entry points so the MLX path keeps its concrete hot types (paged KV, prompt-cache detach and adopt, cache tensors) exposed. MlxBackend (src/backend/mlx.rs) implements the trait by delegating unchanged to the existing crate::loading entry points, so the same loading and forward code runs whether reached directly or through the seam. Selection folds away under default features. Backend is an enum whose only variant is Backend::Mlx, and MlxBackend is a zero-sized type, so the enum is zero-sized with no discriminant. select_backend() always returns that one variant with no environment read and no branch, and every Backend method is a single-arm match marked #[inline]. After inlining, select_backend().load_model(p) lowers to a direct call to the existing MLX loader, identical codegen to the pre-seam build. The optional non-MLX path lives behind the default-off experimental-backend Cargo feature: the experimental module and the Backend::Experimental variant are cfg-gated, so shipping binaries (Apple Silicon, CUDA) compile no extra backend code, and the feature-on select_backend() (the only place that reads an env switch) is compiled only when the feature is enabled. Behavior is preserved by construction: the control-plane load call sites (CLI generate and chat, both server model-worker loops, and the offline draft-model load) now call select_backend().load_model* instead of crate::load_model*, and those methods delegate to the unchanged loaders. The public load_model / load_model_with_adapter / load_model_with_tensor_parallel functions remain available. Focused tests assert selection resolves to MLX under default features, that the seam reaches the real MLX loader (errors on a missing directory rather than a backend shim), and compile-time that MlxBackend implements ComputeBackend and LoadedModel implements LanguageModel.

…ype tradeoff (#338)

inureyes added type:enhancement New features, capabilities, or significant additions priority:medium Medium priority area:architecture Architecture and code structure changes status:review Under review labels Jun 25, 2026

docs: document the ComputeBackend seam and note the concrete-return-t…

6e7bfa6

…ype tradeoff (#338)

inureyes added status:done Completed and removed status:review Under review labels Jun 25, 2026

inureyes merged commit 6ac38ec into main Jun 25, 2026
5 checks passed

inureyes deleted the feat/issue-338-compute-backend-seam branch June 25, 2026 23:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: introduce a ComputeBackend seam to abstract the forward-execution engine#446

feat: introduce a ComputeBackend seam to abstract the forward-execution engine#446
inureyes merged 2 commits into
mainfrom
feat/issue-338-compute-backend-seam

inureyes commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

inureyes commented Jun 25, 2026

Summary

Seam design

What moved behind MlxBackend

Feature flag

How codegen equivalence / no dispatch is guaranteed when the feature is off

How behavior preservation is ensured

What changed

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant