fix(edge): surface backend-init state on local/llm/status#830
fix(edge): surface backend-init state on local/llm/status#830rachmlenig wants to merge 5 commits intomainfrom
Conversation
Backend-init failures and preload failures were swallowed at the
flight-control layer: the Zenoh heartbeat hardcoded "ready" and
_handle_request accepted every request, so a dead llama.cpp backend
silently dropped LLM requests from flight-control. Add a process-wide
BackendState wired through ZenohIPC so the heartbeat publishes honest
readiness ("initializing" / "ready" / "degraded" / "unavailable") and
the request handler refuses non-ready traffic with an explicit
backend_unavailable error instead of passing it to inference.
✅ All E2E Tests Passed!Test Results by Platform
Summary
This comment was automatically generated by the E2E Tests workflow. |
Review Summary by QodoSurface backend readiness on Zenoh heartbeat with admission control
WalkthroughsDescription• Add process-wide BackendState to track and surface backend readiness - Implements Readiness enum: INITIALIZING, READY, DEGRADED, UNAVAILABLE - Thread-safe state mutations with snapshot API for heartbeat publishing • Wire backend state through ZenohIPC for honest status reporting - Heartbeat now publishes real readiness instead of hardcoded "ready" - Request handler refuses non-ready traffic with explicit backend_unavailable error • Record backend initialization and preload outcomes on startup - _init_llama_backend() marks success or sets UNAVAILABLE with reason - Lifespan finalizes readiness from preload results: all-ok → READY, partial → DEGRADED, all-failed → UNAVAILABLE • Narrow admission control for runtime failures - Only HuggingFace offline-mode and missing-entry errors flip backend to DEGRADED - Generic inference errors (bad prompt, decode) do not poison backend status Diagramflowchart LR
Init["_init_llama_backend()"]
Preload["Preload models"]
State["BackendState"]
Heartbeat["_heartbeat_loop()"]
Handler["_handle_request()"]
Response["Response/Status"]
Init -- "mark_initialized or set UNAVAILABLE" --> State
Preload -- "finalize readiness" --> State
State -- "snapshot()" --> Heartbeat
State -- "snapshot()" --> Handler
Heartbeat -- "publish readiness" --> Response
Handler -- "refuse if not READY" --> Response
File Changes1. runtimes/edge/core/backend_state.py
|
Code Review by Qodo
1.
|
Split the readiness-finalization block out of lifespan() into a _finalize_backend_readiness() helper so it can be unit-tested without standing up the full FastAPI app, KV cache manager, and Zenoh session. Behavior is unchanged. Add tests/test_server_lifecycle.py covering the two gaps in the original PR's coverage: - _init_llama_backend(): success marks backend_initialized and leaves readiness at INITIALIZING for finalize to decide; ImportError flips to UNAVAILABLE with "llamafarm_llama not installed"; generic Exception flips to UNAVAILABLE and propagates the underlying message so operators can diagnose. - _finalize_backend_readiness(): UNAVAILABLE from a failed backend init survives — the finalize step must not mask the original reason with preload-derived state. All four preload outcomes (none, all-ok, partial-fail, all-fail) project onto READY/DEGRADED/UNAVAILABLE as documented.
Address two qodo review findings on PR #830: 1. Sticky DEGRADED bug. _maybe_mark_backend_degraded() flipped global readiness to DEGRADED on OfflineModeIsEnabled / LocalEntryNotFoundError, and admission control refuses every non-READY state. Since nothing in the codebase resets readiness to READY at runtime, a single un-cached model would permanently block inference for every other model on the bus until process restart. The HTTP path already treats these as request-scoped 404s; align Zenoh to the same scope by removing the state-flip path entirely. Drop the now-unused state_setter constructor parameter from ZenohIPC and the corresponding kwarg in server.py. 2. Stale last_transition_ms. mark_backend_initialized() flipped the backend_initialized flag without updating last_transition_ms, so heartbeat consumers diffing on the timestamp could not detect the initialization transition. Bump the timestamp inside the lock. Tests: replace TestRuntimeFailureMarksDegraded with TestRuntimeFailuresStayRequestScoped pinning the inverted invariant (state stays READY through both HF and generic exceptions). Add test_mark_backend_initialized_advances_transition_ms for the timestamp bump. 47/47 passing.
Without explicit client mode, zenoh.open() returned a peer-mode session that silently failed to attach to the comms router — heartbeat publishes to local/llm/status became no-ops and flight-control never saw the backend-init signal this PR added.
|
Cross-repo dependency: llamadrone needs a deploy-template fix before this PR's payload is observable on hardware. End-to-end testing on
Until llamadrone#73 lands, this PR's headline behavior (honest readiness on the bus for flight-control) won't be observable in deployed environments. The unit/integration tests here still pass because they mock the Zenoh transport. Verified on {
"service": "edge-runtime",
"status": "ready",
"readiness": "ready",
"reason": "",
"backend_initialized": true,
"last_transition_ms": 1777571665212,
"timestamp_ms": 1777571683403
} |
Most edge-runtime adopters run it as a standalone Pi/Jetson inference server and don't need a pub/sub bus. Zenoh exists for drone/flight-control deployments (e.g. Arc), so it should be opt-in rather than always-on. - server.py: only construct and start ZenohIPC when LLAMAFARM_ZENOH_ENABLED is set to a truthy value (1/true/yes). Off by default. - services/zenoh_ipc.py: drop the inner ZENOH_ENABLED kill switch — the outer gate is now the single source of truth, which avoids two flags with conflicting defaults.
Summary
local/llm/statusheartbeat so flight-control can refuse LLM-dependent commands when the runtime is unhealthy, instead of issuing them and seeing them silently dropped at the inference layer.ZenohIPC._handle_request: requests are now refused witherror: "backend_unavailable"(and a reason string) when readiness is notREADY, instead of being passed to inference where they fail after a long timeout or, worse, succeed against a half-initialized backend.Background — what was broken
The edge runtime serves both HTTP clients and a Zenoh IPC bus used by drone flight-control. Three independent shortcuts combined into a flight-safety bug:
_init_llama_backend()inserver.pyswallowed all init failures with alogger.warning— no flag, no re-raise, no signal to lifespan.ZenohIPC._heartbeat_looppublished a hardcoded"status": "ready"every 5 seconds, regardless of whether llama.cpp was actually up.ZenohIPC._handle_requestaccepted every incoming request and passed it to_inference_fn, where (in offline mode)huggingface_hub.list_repo_fileswould raiseOfflineModeIsEnabledmid-request. The error was logged and a generic{"error": "inference failed"}was published — but the heartbeat continued to claim"ready".From flight-control's perspective: the bus reports a healthy LLM, requests are accepted, and they fail. There was no signal to refuse LLM-dependent commands. This PR fixes that — the version mismatch incident was the trigger, but the underlying flight-safety bug was the silent drop pattern.
What changed
New:
runtimes/edge/core/backend_state.pyA process-wide
BackendStatedataclass with aReadinessenum (INITIALIZING/READY/DEGRADED/UNAVAILABLE), athreading.Lock-guardedset/mark_backend_initialized/snapshotAPI, and a singletonBACKEND_STATE.runtimes/edge/server.py_init_llama_backend()now records outcome onBACKEND_STATE: success →mark_backend_initialized();ImportError/ genericException→set(UNAVAILABLE, reason=...).lifespanfinalizes readiness from preload outcomes:READYDEGRADEDwith reason listing failed model IDsUNAVAILABLEZenohIPCis constructed withstate_provider=BACKEND_STATE.snapshot. Zenoh IPC is gated behindLLAMAFARM_ZENOH_ENABLED(default off).runtimes/edge/services/zenoh_ipc.py__init__accepts an optionalstate_provider(defaults preserve legacy behavior for any other caller)._handle_requestrefuses non-READYrequests immediately witherror: "backend_unavailable",reason, andreadinessfields — without calling_inference_fn._heartbeat_looppublishes the live snapshot (readiness,reason,backend_initialized,last_transition_ms). Legacystatusfield is preserved by mirroring readiness for clients that haven't migrated.Out of scope
huggingface_hub.list_repo_filesoffline-mode trip incommon/llamafarm_common/model_format.py:185— that's a separate hardening change. The flight-safety bug is fixed regardless: even if the HF probe is reached, the resulting failure now surfaces on the bus instead of being silently dropped.Test plan
nx test edge-runtime(38/38 passing) — 15 new tests acrosstest_backend_state.pyandtest_zenoh_ipc_state.py, plus the existingtest_alias.pysuite.python -c "import server"smoke import — server initializes cleanly, backend init succeeds, readiness correctly held atINITIALIZINGuntillifespanruns.ZenohIPCwithBackendState.set(UNAVAILABLE, reason="backend init failed: ImportError(huggingface_hub list_repo_files offline)"). Heartbeat published onlocal/llm/statuswithreadiness: "unavailable",status: "unavailable", and the reason string. The subsequent request was refused onlocal/llm/responsewitherror: "backend_unavailable"and_inference_fnwas never invoked.ZenohIPCwithmark_backend_initialized()+set(READY). Heartbeat publishedreadiness: "ready",backend_initialized: true; the request flowed through to_inference_fnand the response payload containedcontentwith noerrorfield.MERGEABLE/BLOCKEDonly onREVIEW_REQUIRED.