Skip to content

fix(edge): surface backend-init state on local/llm/status#830

Open
rachmlenig wants to merge 5 commits intomainfrom
fix-edge-surface-backend-state-on-zenoh
Open

fix(edge): surface backend-init state on local/llm/status#830
rachmlenig wants to merge 5 commits intomainfrom
fix-edge-surface-backend-state-on-zenoh

Conversation

@rachmlenig
Copy link
Copy Markdown
Contributor

@rachmlenig rachmlenig commented Apr 29, 2026

Summary

  • Wire backend readiness through to the Zenoh local/llm/status heartbeat so flight-control can refuse LLM-dependent commands when the runtime is unhealthy, instead of issuing them and seeing them silently dropped at the inference layer.
  • Add admission control to ZenohIPC._handle_request: requests are now refused with error: "backend_unavailable" (and a reason string) when readiness is not READY, instead of being passed to inference where they fail after a long timeout or, worse, succeed against a half-initialized backend.

Background — what was broken

The edge runtime serves both HTTP clients and a Zenoh IPC bus used by drone flight-control. Three independent shortcuts combined into a flight-safety bug:

  1. _init_llama_backend() in server.py swallowed all init failures with a logger.warning — no flag, no re-raise, no signal to lifespan.
  2. ZenohIPC._heartbeat_loop published a hardcoded "status": "ready" every 5 seconds, regardless of whether llama.cpp was actually up.
  3. ZenohIPC._handle_request accepted every incoming request and passed it to _inference_fn, where (in offline mode) huggingface_hub.list_repo_files would raise OfflineModeIsEnabled mid-request. The error was logged and a generic {"error": "inference failed"} was published — but the heartbeat continued to claim "ready".

From flight-control's perspective: the bus reports a healthy LLM, requests are accepted, and they fail. There was no signal to refuse LLM-dependent commands. This PR fixes that — the version mismatch incident was the trigger, but the underlying flight-safety bug was the silent drop pattern.

What changed

New: runtimes/edge/core/backend_state.py
A process-wide BackendState dataclass with a Readiness enum (INITIALIZING / READY / DEGRADED / UNAVAILABLE), a threading.Lock-guarded set / mark_backend_initialized / snapshot API, and a singleton BACKEND_STATE.

runtimes/edge/server.py

  • _init_llama_backend() now records outcome on BACKEND_STATE: success → mark_backend_initialized(); ImportError / generic Exceptionset(UNAVAILABLE, reason=...).
  • lifespan finalizes readiness from preload outcomes:
    • All preloads OK (or no preload configured) → READY
    • Some preloads failed → DEGRADED with reason listing failed model IDs
    • All preloads failed → UNAVAILABLE
  • ZenohIPC is constructed with state_provider=BACKEND_STATE.snapshot. Zenoh IPC is gated behind LLAMAFARM_ZENOH_ENABLED (default off).

runtimes/edge/services/zenoh_ipc.py

  • __init__ accepts an optional state_provider (defaults preserve legacy behavior for any other caller).
  • _handle_request refuses non-READY requests immediately with error: "backend_unavailable", reason, and readiness fields — without calling _inference_fn.
  • Per-request inference failures (offline-mode trips, decode errors, prompt errors) are intentionally request-scoped: they publish an error response but do NOT mutate global readiness. Otherwise a single un-cached model would permanently block every other model on the bus until restart. The HTTP path treats the same exceptions as request-scoped 404s and Zenoh follows the same scope.
  • _heartbeat_loop publishes the live snapshot (readiness, reason, backend_initialized, last_transition_ms). Legacy status field is preserved by mirroring readiness for clients that haven't migrated.

Out of scope

  • The huggingface_hub.list_repo_files offline-mode trip in common/llamafarm_common/model_format.py:185 — that's a separate hardening change. The flight-safety bug is fixed regardless: even if the HF probe is reached, the resulting failure now surfaces on the bus instead of being silently dropped.
  • HTTP-side admission control. Same pattern would apply but flight-control uses Zenoh, so HTTP behavior is unchanged here.
  • The llama.cpp version bump itself.

Test plan

  • nx test edge-runtime (38/38 passing) — 15 new tests across test_backend_state.py and test_zenoh_ipc_state.py, plus the existing test_alias.py suite.
  • python -c "import server" smoke import — server initializes cleanly, backend init succeeds, readiness correctly held at INITIALIZING until lifespan runs.
  • In-process smoke (offline + missing model): drove ZenohIPC with BackendState.set(UNAVAILABLE, reason="backend init failed: ImportError(huggingface_hub list_repo_files offline)"). Heartbeat published on local/llm/status with readiness: "unavailable", status: "unavailable", and the reason string. The subsequent request was refused on local/llm/response with error: "backend_unavailable" and _inference_fn was never invoked.
  • In-process smoke (happy path): drove ZenohIPC with mark_backend_initialized() + set(READY). Heartbeat published readiness: "ready", backend_initialized: true; the request flowed through to _inference_fn and the response payload contained content with no error field.
  • CI: full Linux/macOS/Windows matrix green (53/53 checks SUCCESS — Build CLI all 5 targets, Build Native Binaries PyApp all matrices, Build Desktop App all targets, Test Python Components 3.11/3.12 across config/rag/server/runtimes/universal/runtimes/edge, CodeQL go/python/js-ts/actions, E2E ubuntu/macos/windows). PR is MERGEABLE / BLOCKED only on REVIEW_REQUIRED.

Backend-init failures and preload failures were swallowed at the
flight-control layer: the Zenoh heartbeat hardcoded "ready" and
_handle_request accepted every request, so a dead llama.cpp backend
silently dropped LLM requests from flight-control. Add a process-wide
BackendState wired through ZenohIPC so the heartbeat publishes honest
readiness ("initializing" / "ready" / "degraded" / "unavailable") and
the request handler refuses non-ready traffic with an explicit
backend_unavailable error instead of passing it to inference.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 29, 2026

All E2E Tests Passed!

Test Results by Platform

OS Status
ubuntu-latest ✅ Passed
macos-latest ✅ Passed
windows-latest ✅ Passed

Summary


This comment was automatically generated by the E2E Tests workflow.

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Surface backend readiness on Zenoh heartbeat with admission control

🐞 Bug fix ✨ Enhancement

Grey Divider

Walkthroughs

Description
• Add process-wide BackendState to track and surface backend readiness
  - Implements Readiness enum: INITIALIZING, READY, DEGRADED, UNAVAILABLE
  - Thread-safe state mutations with snapshot API for heartbeat publishing
• Wire backend state through ZenohIPC for honest status reporting
  - Heartbeat now publishes real readiness instead of hardcoded "ready"
  - Request handler refuses non-ready traffic with explicit backend_unavailable error
• Record backend initialization and preload outcomes on startup
  - _init_llama_backend() marks success or sets UNAVAILABLE with reason
  - Lifespan finalizes readiness from preload results: all-ok → READY, partial → DEGRADED,
  all-failed → UNAVAILABLE
• Narrow admission control for runtime failures
  - Only HuggingFace offline-mode and missing-entry errors flip backend to DEGRADED
  - Generic inference errors (bad prompt, decode) do not poison backend status
Diagram
flowchart LR
  Init["_init_llama_backend()"]
  Preload["Preload models"]
  State["BackendState"]
  Heartbeat["_heartbeat_loop()"]
  Handler["_handle_request()"]
  Response["Response/Status"]
  
  Init -- "mark_initialized or set UNAVAILABLE" --> State
  Preload -- "finalize readiness" --> State
  State -- "snapshot()" --> Heartbeat
  State -- "snapshot()" --> Handler
  Heartbeat -- "publish readiness" --> Response
  Handler -- "refuse if not READY" --> Response
Loading

Grey Divider

File Changes

1. runtimes/edge/core/backend_state.py ✨ Enhancement +59/-0

New backend state tracking module

• New module providing process-wide BackendState dataclass with Readiness enum
• Implements thread-safe set(), mark_backend_initialized(), and snapshot() methods
• Singleton BACKEND_STATE instance for use across the runtime
• Tracks readiness, reason, backend_initialized flag, and last_transition_ms timestamp

runtimes/edge/core/backend_state.py


2. runtimes/edge/server.py ✨ Enhancement +37/-2

Wire backend state through initialization and lifespan

• Import BACKEND_STATE and Readiness from new backend_state module
• _init_llama_backend() now records outcome: success calls mark_backend_initialized(), failures
 set UNAVAILABLE with reason
• lifespan() finalizes backend readiness after preload loop based on success/failure counts
• Pass state_provider and state_setter callbacks to ZenohIPC constructor

runtimes/edge/server.py


3. runtimes/edge/services/zenoh_ipc.py ✨ Enhancement +104/-9

Add admission control and state-aware heartbeat

• Accept optional state_provider and state_setter callables in __init___handle_request() checks readiness before inference; refuses non-ready requests with
 backend_unavailable error
• New _maybe_mark_backend_degraded() method flips state to DEGRADED only for HuggingFace
 offline-mode and missing-entry errors
• _heartbeat_loop() publishes live state snapshot when state_provider is wired; preserves legacy
 hardcoded "ready" when not

runtimes/edge/services/zenoh_ipc.py


View more (2)
4. runtimes/edge/tests/test_backend_state.py 🧪 Tests +69/-0

Unit tests for BackendState

• Test initial state is INITIALIZING with empty reason and backend_initialized=False
• Test set() and mark_backend_initialized() mutations
• Test last_transition_ms advances on state transitions
• Test snapshot is JSON-serializable for heartbeat publishing

runtimes/edge/tests/test_backend_state.py


5. runtimes/edge/tests/test_zenoh_ipc_state.py 🧪 Tests +202/-0

Integration tests for ZenohIPC state handling

• Test admission control refuses requests when readiness is INITIALIZING, UNAVAILABLE, or
 DEGRADED
• Test requests pass through to inference when readiness is READY
• Test HuggingFace offline-mode errors flip backend to DEGRADED
• Test generic inference errors do not poison backend status
• Test heartbeat publishes current state snapshot with all fields
• Test legacy mode (no state_provider) preserves hardcoded "ready" behavior

runtimes/edge/tests/test_zenoh_ipc_state.py


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects Bot commented Apr 29, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Long pytest test signatures📘 Rule violation ⚙ Maintainability
Description
Several newly added test function definitions exceed the repository’s 88-character Python
line-length convention, which can fail ruff/format checks. This reduces consistency and may block
CI/linting for the PR.
Code

runtimes/edge/tests/test_zenoh_ipc_state.py[R59-92]

+    async def test_refuses_when_initializing(self, ipc, fake_session, inference_calls):
+        await ipc._handle_request({"request_id": "r1", "model": "m"})
+        assert inference_calls == []
+        assert len(fake_session.puts) == 1
+        topic, payload = fake_session.puts[0]
+        assert topic == "local/llm/response"
+        assert payload["error"] == "backend_unavailable"
+        assert payload["readiness"] == "initializing"
+        assert payload["request_id"] == "r1"
+
+    @pytest.mark.asyncio
+    async def test_refuses_when_unavailable(self, ipc, state, fake_session, inference_calls):
+        state.set(Readiness.UNAVAILABLE, "backend init failed: missing lib")
+        await ipc._handle_request({"request_id": "r2", "model": "m"})
+        assert inference_calls == []
+        topic, payload = fake_session.puts[0]
+        assert payload["error"] == "backend_unavailable"
+        assert payload["reason"] == "backend init failed: missing lib"
+        assert payload["readiness"] == "unavailable"
+
+    @pytest.mark.asyncio
+    async def test_refuses_when_degraded(self, ipc, state, fake_session, inference_calls):
+        state.set(Readiness.DEGRADED, "preload failed: m1")
+        await ipc._handle_request({"request_id": "r3", "model": "m"})
+        assert inference_calls == []
+        topic, payload = fake_session.puts[0]
+        assert payload["error"] == "backend_unavailable"
+        assert payload["readiness"] == "degraded"
+
+    @pytest.mark.asyncio
+    async def test_passes_through_when_ready(self, ipc, state, fake_session, inference_calls):
+        state.mark_backend_initialized()
+        state.set(Readiness.READY)
+        await ipc._handle_request({"request_id": "r4", "model": "m"})
Evidence
PR Compliance ID 1 requires Python code to respect an 88 character line length. The added test
definitions include long single-line function signatures (and related long lines) that exceed this
limit.

AGENTS.md
runtimes/edge/tests/test_zenoh_ipc_state.py[59-59]
runtimes/edge/tests/test_zenoh_ipc_state.py[70-70]
runtimes/edge/tests/test_zenoh_ipc_state.py[89-89]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Some new pytest test definitions exceed the 88-character line-length convention required by repository formatting (ruff).
## Issue Context
In `runtimes/edge/tests/test_zenoh_ipc_state.py`, several `async def test_...` signatures are written on a single long line.
## Fix Focus Areas
- runtimes/edge/tests/test_zenoh_ipc_state.py[59-92]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Sticky DEGRADED blocks Zenoh🐞 Bug ☼ Reliability
Description
ZenohIPC._maybe_mark_backend_degraded() sets global readiness to DEGRADED on
OfflineModeIsEnabled/LocalEntryNotFoundError, and _handle_request rejects all non-"ready" states, so
a single such failure can cause all subsequent Zenoh inference requests to be refused until
readiness is explicitly reset (this codebase only sets READY during startup).
Code

runtimes/edge/services/zenoh_ipc.py[R241-268]

+    def _maybe_mark_backend_degraded(self, exc: BaseException, model: str) -> None:
+        """If exc indicates the backend itself is unhealthy, flip state to
+        DEGRADED so the next heartbeat reflects it.
+
+        Narrowly scoped to errors that mean "the runtime cannot serve any
+        request" — not generic inference failures (bad prompt, decode error)
+        which should not poison the whole backend's status.
+        """
+        if self._state_setter is None:
+            return
+        # Lazy imports — huggingface_hub may not be installed in minimal
+        # builds, and we want this module importable without it.
+        try:
+            from huggingface_hub.errors import (  # type: ignore[import-not-found]
+                LocalEntryNotFoundError,
+                OfflineModeIsEnabled,
+            )
+        except ImportError:
+            return
+
+        if isinstance(exc, (OfflineModeIsEnabled, LocalEntryNotFoundError)):
+            # Import here to avoid a circular-import risk at module load.
+            from core.backend_state import Readiness
+
+            self._state_setter(
+                Readiness.DEGRADED,
+                f"network probe failed in offline mode for {model}: {exc}",
+            )
Evidence
Admission control refuses any request when readiness!=ready, while runtime exceptions in
_maybe_mark_backend_degraded can move readiness to DEGRADED. In the current codebase, readiness is
only set to READY in server lifespan startup finalization, and HTTP handling treats these HF
exceptions as request/model-scoped 404s (suggesting they don’t necessarily represent a permanently
unhealthy backend). Together, this makes DEGRADED effectively “sticky” in production unless
something external resets it.

runtimes/edge/services/zenoh_ipc.py[177-236]
runtimes/edge/services/zenoh_ipc.py[241-268]
runtimes/edge/server.py[526-544]
runtimes/edge/routers/chat_completions/service.py[1539-1573]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`ZenohIPC` can set `Readiness.DEGRADED` on certain HF/model-resolution exceptions, and because admission control refuses all non-READY states, Zenoh inference can remain blocked indefinitely unless something resets readiness to READY.
### Issue Context
- `_handle_request` refuses all requests when `readiness != "ready"`.
- `_maybe_mark_backend_degraded` sets global readiness to DEGRADED for `OfflineModeIsEnabled` / `LocalEntryNotFoundError`.
- In this repo, READY is only set during startup finalization; there is no runtime path that restores READY after transient failures.
### Fix focus areas
- Decide/implement an explicit recovery policy, e.g.:
- do not flip global readiness for request-scoped model resolution errors; OR
- allow requests in DEGRADED (make it advisory) while still surfacing state via heartbeat; OR
- automatically restore READY after a successful inference / after a cooldown.
- runtimes/edge/services/zenoh_ipc.py[177-268]
- runtimes/edge/server.py[526-544]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Stale transition timestamp🐞 Bug ◔ Observability
Description
BackendState.mark_backend_initialized() flips backend_initialized without updating
last_transition_ms, so heartbeats can publish an initialization state change without a corresponding
transition timestamp update.
Code

runtimes/edge/core/backend_state.py[R45-48]

+    def mark_backend_initialized(self) -> None:
+        with self._lock:
+            self.backend_initialized = True
+
Evidence
The snapshot published to Zenoh includes last_transition_ms, but mark_backend_initialized() does not
update it; the heartbeat republishes this snapshot as-is, so consumers cannot reliably detect when
backend_initialized changed based on last_transition_ms.

runtimes/edge/core/backend_state.py[39-56]
runtimes/edge/services/zenoh_ipc.py[291-298]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`BackendState.mark_backend_initialized()` mutates state but does not update `last_transition_ms`, making `snapshot()` inconsistent for consumers that use `last_transition_ms` to detect state changes.
### Issue Context
Zenoh heartbeat publishes the full snapshot (including `last_transition_ms`). If `backend_initialized` changes without bumping `last_transition_ms`, observers may miss that transition.
### Fix focus areas
- Update `last_transition_ms` inside `mark_backend_initialized()` (or add a dedicated `backend_initialized_ms` field and publish that).
- runtimes/edge/core/backend_state.py[45-56]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment thread runtimes/edge/tests/test_zenoh_ipc_state.py Dismissed
Comment thread runtimes/edge/tests/test_zenoh_ipc_state.py Dismissed
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 5 files

Comment thread runtimes/edge/tests/test_zenoh_ipc_state.py
Split the readiness-finalization block out of lifespan() into a
_finalize_backend_readiness() helper so it can be unit-tested without
standing up the full FastAPI app, KV cache manager, and Zenoh session.
Behavior is unchanged.

Add tests/test_server_lifecycle.py covering the two gaps in the
original PR's coverage:

- _init_llama_backend(): success marks backend_initialized and leaves
  readiness at INITIALIZING for finalize to decide; ImportError flips
  to UNAVAILABLE with "llamafarm_llama not installed"; generic
  Exception flips to UNAVAILABLE and propagates the underlying message
  so operators can diagnose.
- _finalize_backend_readiness(): UNAVAILABLE from a failed backend
  init survives — the finalize step must not mask the original reason
  with preload-derived state. All four preload outcomes (none, all-ok,
  partial-fail, all-fail) project onto READY/DEGRADED/UNAVAILABLE as
  documented.
Address two qodo review findings on PR #830:

1. Sticky DEGRADED bug. _maybe_mark_backend_degraded() flipped global
   readiness to DEGRADED on OfflineModeIsEnabled / LocalEntryNotFoundError,
   and admission control refuses every non-READY state. Since nothing in
   the codebase resets readiness to READY at runtime, a single un-cached
   model would permanently block inference for every other model on the
   bus until process restart. The HTTP path already treats these as
   request-scoped 404s; align Zenoh to the same scope by removing the
   state-flip path entirely. Drop the now-unused state_setter constructor
   parameter from ZenohIPC and the corresponding kwarg in server.py.

2. Stale last_transition_ms. mark_backend_initialized() flipped the
   backend_initialized flag without updating last_transition_ms, so
   heartbeat consumers diffing on the timestamp could not detect the
   initialization transition. Bump the timestamp inside the lock.

Tests: replace TestRuntimeFailureMarksDegraded with
TestRuntimeFailuresStayRequestScoped pinning the inverted invariant
(state stays READY through both HF and generic exceptions). Add
test_mark_backend_initialized_advances_transition_ms for the timestamp
bump. 47/47 passing.
Without explicit client mode, zenoh.open() returned a peer-mode session
that silently failed to attach to the comms router — heartbeat publishes
to local/llm/status became no-ops and flight-control never saw the
backend-init signal this PR added.
@rachmlenig
Copy link
Copy Markdown
Contributor Author

Cross-repo dependency: llamadrone needs a deploy-template fix before this PR's payload is observable on hardware.

End-to-end testing on drone-60ee4 revealed that the new local/llm/status payload from this PR was correctly produced by the heartbeat task but never reached the Zenoh bus. Two issues, both pre-existing:

  1. Fixed in this PR (67a4a378)services/zenoh_ipc.py opened the Zenoh session without mode: "client". zenoh-python's default peer mode silently tolerates an unreachable endpoint, so zenoh.open() returned a session that wasn't actually attached to the comms router. Every session.put() was a no-op. Added the one-line config + a comment explaining why.

  2. Fixed in llamadrone/llamadrone#73 (separate repo) — even with client mode, the comms socket at /run/arc/zenoh.sock is root:root 0755. Stream-socket connect() needs write perm, so the non-root USER edge in this image's Dockerfile gets EACCES. Every other arc service in llamadrone runs as root; that PR adds user: "0:0" to the edge-runtime compose entry to match.

Until llamadrone#73 lands, this PR's headline behavior (honest readiness on the bus for flight-control) won't be observable in deployed environments. The unit/integration tests here still pass because they mock the Zenoh transport.

Verified on drone-60ee4 with both fixes applied — captured payloads on local/llm/status:

{
  "service": "edge-runtime",
  "status": "ready",
  "readiness": "ready",
  "reason": "",
  "backend_initialized": true,
  "last_transition_ms": 1777571665212,
  "timestamp_ms": 1777571683403
}

Most edge-runtime adopters run it as a standalone Pi/Jetson inference
server and don't need a pub/sub bus. Zenoh exists for drone/flight-control
deployments (e.g. Arc), so it should be opt-in rather than always-on.

- server.py: only construct and start ZenohIPC when LLAMAFARM_ZENOH_ENABLED
  is set to a truthy value (1/true/yes). Off by default.
- services/zenoh_ipc.py: drop the inner ZENOH_ENABLED kill switch — the
  outer gate is now the single source of truth, which avoids two flags
  with conflicting defaults.
@rachmlenig rachmlenig requested review from BobbyRadford and mhamann May 1, 2026 18:33
@rachmlenig rachmlenig added the merge-when-approved Indicates that the author wants to merge this PR and is not planning to add further commits. label May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-when-approved Indicates that the author wants to merge this PR and is not planning to add further commits.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant