fix(common): guard detect_model_format HF fallback in offline mode by rachmlenig · Pull Request #831 · llama-farm/llamafarm

rachmlenig · 2026-04-29T19:53:46Z

Summary

detect_model_format in llamafarm_common.model_format previously fell through to huggingface_hub.HfApi.list_repo_files even when LLAMAFARM_OFFLINE=1, producing an OfflineModeIsEnabled traceback from deep inside huggingface_hub instead of the structured FileNotFoundError that list_gguf_files (in the same package) already raises.
Add a matching offline guard mirroring the list_gguf_files pattern.
Map the new FileNotFoundError to the same 404 model_not_cached response that the existing OfflineModeIsEnabled branch produces in the edge runtime's chat-completions handler, so the API contract is unchanged.

Background — what was broken

common/llamafarm_common/model_format.py:detect_model_format walks four resolution tiers:

model_id.endswith(".gguf") short-circuit — file extension only, no I/O.
_check_local_cache_for_model — scans the HuggingFace cache.
resolve_from_model_dir — checks $LLAMAFARM_MODEL_DIR.
HfApi().list_repo_files(...) — HTTP call to huggingface.co.

Tier 4 had no offline guard. list_gguf_files in the same package (model_utils.py:419) already raises a structured FileNotFoundError when LLAMAFARM_OFFLINE=1 is set. _binary.py in llamafarm-llama does the same via _is_offline() / get_lib_path() (_binary.py:323,387). detect_model_format was the outlier — likely missed during the refactor train (3eed3558 fix(common): remove filesystem probing from detect_model_format entirely, e29ecde9 fix(common): check LLAMAFARM_MODEL_DIR before HuggingFace API in detect_model_format).

Verified the gap exists in the released v0.0.32 via git show v0.0.32:common/llamafarm_common/model_format.py — list_repo_files is called unconditionally on line 185.

When the offline guard is missing, the call falls into huggingface_hub which sees HF_HUB_OFFLINE=1 (propagated from LLAMAFARM_OFFLINE by our bootstrap) and raises OfflineModeIsEnabled. The exception bubbles up as a noisy traceback when a clean FileNotFoundError with operator guidance ("place the model under $LLAMAFARM_MODEL_DIR or pre-populate the HuggingFace cache") would be much more useful — and consistent with the rest of the package.

What changed

common/llamafarm_common/model_format.py
Added from . import offline_mode and a guard between the LLAMAFARM_MODEL_DIR check and the HfApi() construction:

if offline_mode.is_offline():
    raise FileNotFoundError(
        f"detect_model_format({base_model_id!r}) refused in offline mode "
        f"(LLAMAFARM_OFFLINE=1). Place the model under $LLAMAFARM_MODEL_DIR "
        f"or pre-populate the HuggingFace cache."
    )

This mirrors the existing list_gguf_files shape exactly.

runtimes/edge/routers/chat_completions/service.py
The existing handler at line 1562 catches huggingface_hub.errors.OfflineModeIsEnabled and converts it to a 404 model_not_cached. With the guard in place, that exception is no longer raised — FileNotFoundError is. Added a parallel isinstance(e, FileNotFoundError) and "LLAMAFARM_OFFLINE" in str(e) branch that produces the same 404 response, so API consumers see no contract change. The LLAMAFARM_OFFLINE substring check keeps this from accidentally swallowing other FileNotFoundErrors (e.g. a missing GGUF file in a non-offline scenario).

The universal runtime's chat handler does not have an analogous OfflineModeIsEnabled branch, so no change is needed there.

common/tests/test_model_format_offline.py (new)
6 tests covering:

Offline + cache miss + model-dir miss → FileNotFoundError with LLAMAFARM_OFFLINE in the message, and HfApi is never constructed.
Offline + local-cache hit → still resolves cleanly without touching the network.
Offline + LLAMAFARM_MODEL_DIR hit → still resolves cleanly.
Online (no env var) → network path is reached as before.
.gguf extension short-circuit works regardless of offline state.

Out of scope

The flight-safety status-propagation work (feat(edge): surface backend-init state on local/llm/status) is in fix(edge): surface backend-init state on local/llm/status #830. This PR is the complementary fix on the common side: even with the status propagation in place, suppressing the OfflineModeIsEnabled traceback noise is its own win for operators.
The HF-fallback path inside list_gguf_files and _binary.py are already correctly guarded — no change needed there.
The is_offline() env-var check in _binary.py (_is_offline() / get_lib_path()) is verified present and correct in v0.0.32. It is not a regression.

Test plan

cd common && uv run pytest tests/ — 135/135 passing (6 new tests in test_model_format_offline.py).
cd runtimes/universal && uv run pytest tests/test_model_format.py — 6/6 passing (existing tests unaffected).
cd runtimes/edge && uv run pytest tests/ — 23/23 passing.
Manual smoke: with LLAMAFARM_OFFLINE=1 and a model not present locally, hit /v1/chat/completions and confirm a clean 404 model_not_cached response with no OfflineModeIsEnabled traceback in the logs.
CI: full Linux/macOS/Windows matrix.

When LLAMAFARM_OFFLINE=1, detect_model_format previously fell through to huggingface_hub.HfApi.list_repo_files if neither the local HF cache nor LLAMAFARM_MODEL_DIR resolved the model. That produced a noisy OfflineModeIsEnabled traceback from inside huggingface_hub instead of the structured FileNotFoundError that list_gguf_files (in the same package) already raises. Add a matching guard before the HfApi() construction so callers see a consistent offline-mode error format, and update the edge runtime's chat-completions handler to map the new FileNotFoundError to the same 404 model_not_cached response that the OfflineModeIsEnabled branch produced — preserving the API contract for clients.

qodo-free-for-open-source-projects · 2026-04-29T19:54:15Z

Review Summary by Qodo

Guard detect_model_format HF fallback in offline mode

🐞 Bug fix

Walkthroughs

Description

• Add offline-mode guard to detect_model_format to prevent HuggingFace API calls
• Raise structured FileNotFoundError instead of OfflineModeIsEnabled traceback
• Map new FileNotFoundError to existing 404 response in edge runtime handler
• Add comprehensive offline-mode test suite for model format detection

Diagram

flowchart LR
  A["detect_model_format called<br/>with LLAMAFARM_OFFLINE=1"] --> B{"Check local cache<br/>or LLAMAFARM_MODEL_DIR"}
  B -->|Hit| C["Return format<br/>no API call"]
  B -->|Miss| D{"Offline guard<br/>is_offline()"}
  D -->|True| E["Raise FileNotFoundError<br/>with guidance"]
  D -->|False| F["Call HfApi<br/>list_repo_files"]
  F --> G["Return format"]
  E --> H["Edge runtime handler<br/>catches FileNotFoundError"]
  H --> I["Return 404<br/>model_not_cached"]

File Changes

1. common/llamafarm_common/model_format.py 🐞 Bug fix +12/-0

Add offline guard to detect_model_format function

• Import offline_mode module for offline state checking
• Add offline guard before HfApi() construction that raises FileNotFoundError
• Guard mirrors existing pattern in list_gguf_files for consistency
• Error message provides actionable guidance for offline users

common/llamafarm_common/model_format.py

2. common/tests/test_model_format_offline.py 🧪 Tests +148/-0

Comprehensive offline-mode tests for detect_model_format

• New test file with 8 comprehensive offline-mode test cases
• Verify FileNotFoundError is raised without constructing HfApi in offline mode
• Test that error message contains helpful guidance about LLAMAFARM_MODEL_DIR
• Verify offline guard does not block local cache or model directory hits
• Confirm online mode still calls API as before
• Test .gguf extension short-circuit works in offline mode

common/tests/test_model_format_offline.py

3. runtimes/edge/routers/chat_completions/service.py 🐞 Bug fix +17/-0

Map offline FileNotFoundError to 404 response

• Add handler for FileNotFoundError with LLAMAFARM_OFFLINE in message
• Map new error to same 404 model_not_cached response as OfflineModeIsEnabled
• Preserve API contract for clients by returning identical HTTP response
• Use substring check to avoid accidentally catching unrelated FileNotFoundErrors

runtimes/edge/routers/chat_completions/service.py

qodo-free-for-open-source-projects · 2026-04-29T19:54:16Z

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

1. ~~Brittle offline error detection~~ ☑ 🐞 Bug ⚙ Maintainability

Description

The edge chat-completions handler identifies the new offline-mode failure by substring-matching the
FileNotFoundError message ("LLAMAFARM_OFFLINE" in str(e)), so a future message change in
detect_model_format will silently stop returning the intended 404 model_not_cached and fall through
to a 500.

Code

runtimes/edge/routers/chat_completions/service.py[R1579-1590]

+            if isinstance(e, FileNotFoundError) and "LLAMAFARM_OFFLINE" in str(e):
+                logger.warning(f"Model not cached locally (offline mode): {e}")
+                raise HTTPException(
+                    status_code=404,
+                    detail={
+                        "error": "model_not_cached",
+                        "message": (
+                            f"Model '{chat_request.model}' is not cached locally "
+                            "and offline mode is enabled"
+                        ),
+                    },
+                ) from e

Evidence
The handler’s behavior depends on a specific message token, and that token is currently produced by
detect_model_format’s newly added FileNotFoundError message. This couples API behavior to an
untyped, unstructured string that can change during refactors.
runtimes/edge/routers/chat_completions/service.py[1574-1590]
common/llamafarm_common/model_format.py[183-198]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Edge chat-completions maps offline missing-model errors to a 404 by checking whether a `FileNotFoundError` message contains the substring `LLAMAFARM_OFFLINE`. This is brittle because API behavior becomes dependent on an error-message string that may change.
### Issue Context
`detect_model_format()` now raises `FileNotFoundError` in offline mode with a message containing `LLAMAFARM_OFFLINE=1`, and the edge handler matches that string to decide whether to return `404 model_not_cached`.
### Fix Focus Areas
- common/llamafarm_common/model_format.py[183-198]
- runtimes/edge/routers/chat_completions/service.py[1574-1590]
### Suggested fix
- Introduce a dedicated exception type in `llamafarm_common` (e.g., `class OfflineModelNotCachedError(FileNotFoundError): ...`) and raise that from `detect_model_format()`.
- In the edge handler, replace the substring match with `isinstance(e, OfflineModelNotCachedError)` (or catch it explicitly).
- Keep the message content/operator guidance as-is, but make the programmatic detection rely on the exception type, not the message.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~Offline 404 not consistent~~ ☑ 🐞 Bug ☼ Reliability

Description

With detect_model_format now raising FileNotFoundError in offline mode, edge endpoints that call
load_language without the chat-completions exception mapping (e.g., /v1/completions) will return an
unhandled 500 instead of a structured 404 model_not_cached for the same “model not cached while
offline” condition.

Code

common/llamafarm_common/model_format.py[R187-192]

+    if offline_mode.is_offline():
+        raise FileNotFoundError(
+            f"detect_model_format({base_model_id!r}) refused in offline mode "
+            f"(LLAMAFARM_OFFLINE=1). Place the model under $LLAMAFARM_MODEL_DIR "
+            f"or pre-populate the HuggingFace cache."
+        )

Evidence
detect_model_format now deterministically raises FileNotFoundError when offline and no local hit
exists. /v1/completions calls load_language directly with no try/except translating that
FileNotFoundError into a client-facing 404, while chat-completions does translate it.
common/llamafarm_common/model_format.py[183-198]
runtimes/edge/routers/completions.py[44-54]
runtimes/edge/routers/chat_completions/service.py[1574-1590]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`detect_model_format()` now raises a `FileNotFoundError` in strict offline mode when it would otherwise fall back to the HuggingFace API. Endpoints that call `load_language()` without translating this specific offline-mode failure will surface a generic 500 instead of the structured `404 model_not_cached` response used by chat-completions.
### Issue Context
- `/v1/chat/completions` maps this offline-mode error to a 404.
- `/v1/completions` calls `load_language()` directly and does not add similar mapping.
### Fix Focus Areas
- runtimes/edge/routers/completions.py[44-54]
- common/llamafarm_common/model_format.py[183-198]
### Suggested fix
Option A (preferred if you implement a structured exception type):
- Catch the new `OfflineModelNotCachedError` (see other finding) in `/v1/completions` and raise the same `HTTPException(status_code=404, detail={"error":"model_not_cached", ...})` shape.
Option B (minimal change):
- Add a try/except around `load_language(...)` in `/v1/completions` and map `FileNotFoundError` with the offline marker to `404 model_not_cached`, mirroring chat-completions.
Also consider adding a small test to pin this behavior for `/v1/completions` in offline mode.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

github-actions · 2026-04-29T19:55:55Z

✅ All E2E Tests Passed!

Test Results by Platform

OS	Status
ubuntu-latest	✅ Passed
macos-latest	✅ Passed
windows-latest	✅ Passed

Summary

Passed: 3/3
Workflow Run: View Details
Commit: 1b35fbe

This comment was automatically generated by the E2E Tests workflow.

cubic-dev-ai

No issues found across 3 files

detect_model_format() raises FileNotFoundError in strict offline mode when the requested model is not cached. /v1/chat/completions translates this into a 404 model_not_cached response, but /v1/completions called load_language() directly and surfaced a generic 500. Mirror the chat-completions handling so API consumers see a stable error contract regardless of which endpoint they hit.

…tion Reviewer feedback: edge handlers were detecting the offline missing-model case by checking whether a FileNotFoundError's message contained the substring 'LLAMAFARM_OFFLINE'. That couples API behavior to an error message string. Introduce OfflineModelNotCachedError in llamafarm_common (subclass of FileNotFoundError so existing callers still work). detect_model_format() raises the typed exception, and the edge chat-completions and completions handlers match on isinstance(...) instead of substring.

cubic-dev-ai Bot reviewed Apr 29, 2026

View reviewed changes

rachmlenig requested review from BobbyRadford and mhamann April 29, 2026 20:20

rachmlenig added the merge-when-approved Indicates that the author wants to merge this PR and is not planning to add further commits. label Apr 29, 2026

rachmlenig added 2 commits May 4, 2026 13:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(common): guard detect_model_format HF fallback in offline mode#831

fix(common): guard detect_model_format HF fallback in offline mode#831
rachmlenig wants to merge 3 commits intomainfrom
fix-common-guard-offline-detect-model-format

rachmlenig commented Apr 29, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented Apr 29, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rachmlenig commented Apr 29, 2026

Summary

Background — what was broken

What changed

Out of scope

Test plan

Uh oh!

qodo-free-for-open-source-projects Bot commented Apr 29, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-free-for-open-source-projects Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

github-actions Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ All E2E Tests Passed!

Test Results by Platform

Summary

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qodo-free-for-open-source-projects Bot commented Apr 29, 2026 •

edited

Loading

github-actions Bot commented Apr 29, 2026 •

edited

Loading