Name	Name	Last commit message	Last commit date
parent directory ..
examples	examples
src/mlxcel	src/mlxcel
tests	tests
.gitignore	.gitignore
README.md	README.md
pyproject.toml	pyproject.toml

mlxcel (Python client)

A thin, pure-Python client for the mlxcel OpenAI-compatible inference server. It spawns and manages a local mlxcel serve process (managed mode) or connects to an already-running one (connect mode), auto-discovers the served model id, and exposes the raw openai client for the full API surface.

This is Phase 1 of Python integration: it builds entirely on the existing HTTP server, so it needs no native extension and no changes to the Rust inference core.

Install

pip install ./python          # from a repo checkout
pip install ./python[dev]     # with pytest, ruff, mypy for development

Requires Python 3.9+. The client itself is pure Python (openai>=1.40, httpx>=0.27). Managed mode additionally needs the mlxcel binary on PATH, or pass binary= / set MLXCEL_BIN.

Usage

import mlxcel

# Managed mode: spawn and supervise a local server.
with mlxcel.LLM("mlx-community/Qwen3-4B-4bit") as llm:
    print(llm.generate("def fib(n):", max_tokens=128, temperature=0.7))

    for delta in llm.stream("Write a haiku about autumn"):
        print(delta, end="", flush=True)

    print(llm.chat([{"role": "user", "content": "Hello"}], max_tokens=64))

    print(llm.model)        # resolved model id (auto-discovered)
    print(llm.models())     # ["<id>"]
    ids = llm.tokenize("hello world")
    print(llm.detokenize(ids))

    # Escape hatch: the full OpenAI surface (tools, response_format, logprobs).
    oai = llm.openai_client
    oai.chat.completions.create(model=llm.model, messages=[...], tools=[...])

# Connect mode: talk to an already-running server.
llm = mlxcel.LLM(base_url="http://localhost:8080/v1")  # TCP
llm = mlxcel.LLM(socket="/tmp/mlxcel.sock")            # Unix socket

Async usage mirrors the sync API via mlxcel.AsyncLLM (await llm.generate(...), async for delta in llm.stream(...)).

Modes

Managed mode (default when model= is given): the client spawns mlxcel serve, waits until /health returns ready, forwards server logs to the mlxcel.server Python logger, and stops the process on exit.
Connect mode (when base_url= or socket= is given): no subprocess; the client just talks to the server.
Passing both a model and a connect target raises MlxcelError.

On POSIX, managed mode defaults to a Unix domain socket for low-overhead local IPC. Keep socket paths short (sun_path is about 104 bytes on macOS, 108 on Linux); the default lives under /tmp. Pass socket= to override. Windows uses TCP.

Sampling parameters

generate, stream, chat, and chat_stream accept OpenAI sampling fields directly: max_tokens, temperature, top_p, stop, seed, presence_penalty, frequency_penalty, logit_bias, response_format. Server-specific knobs (top_k, min_p, repetition_penalty, DRY settings) are forwarded in the request body; you can also pass an explicit extra_body={...}.

Errors

MlxcelError: base class.
MlxcelServerError: launch, readiness, or crash failures (carries the server stderr tail).
MlxcelTimeoutError: readiness timeout.

HTTP and API errors surface as native openai SDK exceptions (for example openai.APIStatusError), not as Mlxcel* types.

Tests

pip install -e ./python[dev]
ruff check python
ruff format --check python
mypy python/src
pytest python/tests -m "not e2e"      # unit + lifecycle, no binary needed

The end-to-end test (-m e2e) is skipped unless MLXCEL_BIN points at a built binary:

MLXCEL_BIN=/path/to/mlxcel pytest python/tests/test_e2e.py -m e2e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

mlxcel (Python client)

Install

Usage

Modes

Sampling parameters

Errors

Tests

Uh oh!

FilesExpand file tree

python

Directory actions

More options

Directory actions

More options

Latest commit

History

python

Folders and files

parent directory

README.md

mlxcel (Python client)

Install

Usage

Modes

Sampling parameters

Errors

Tests