Skip to content

fix(common): guard detect_model_format HF fallback in offline mode#831

Open
rachmlenig wants to merge 3 commits intomainfrom
fix-common-guard-offline-detect-model-format
Open

fix(common): guard detect_model_format HF fallback in offline mode#831
rachmlenig wants to merge 3 commits intomainfrom
fix-common-guard-offline-detect-model-format

Conversation

@rachmlenig
Copy link
Copy Markdown
Contributor

Summary

  • detect_model_format in llamafarm_common.model_format previously fell through to huggingface_hub.HfApi.list_repo_files even when LLAMAFARM_OFFLINE=1, producing an OfflineModeIsEnabled traceback from deep inside huggingface_hub instead of the structured FileNotFoundError that list_gguf_files (in the same package) already raises.
  • Add a matching offline guard mirroring the list_gguf_files pattern.
  • Map the new FileNotFoundError to the same 404 model_not_cached response that the existing OfflineModeIsEnabled branch produces in the edge runtime's chat-completions handler, so the API contract is unchanged.

Background — what was broken

common/llamafarm_common/model_format.py:detect_model_format walks four resolution tiers:

  1. model_id.endswith(".gguf") short-circuit — file extension only, no I/O.
  2. _check_local_cache_for_model — scans the HuggingFace cache.
  3. resolve_from_model_dir — checks $LLAMAFARM_MODEL_DIR.
  4. HfApi().list_repo_files(...) — HTTP call to huggingface.co.

Tier 4 had no offline guard. list_gguf_files in the same package (model_utils.py:419) already raises a structured FileNotFoundError when LLAMAFARM_OFFLINE=1 is set. _binary.py in llamafarm-llama does the same via _is_offline() / get_lib_path() (_binary.py:323,387). detect_model_format was the outlier — likely missed during the refactor train (3eed3558 fix(common): remove filesystem probing from detect_model_format entirely, e29ecde9 fix(common): check LLAMAFARM_MODEL_DIR before HuggingFace API in detect_model_format).

Verified the gap exists in the released v0.0.32 via git show v0.0.32:common/llamafarm_common/model_format.pylist_repo_files is called unconditionally on line 185.

When the offline guard is missing, the call falls into huggingface_hub which sees HF_HUB_OFFLINE=1 (propagated from LLAMAFARM_OFFLINE by our bootstrap) and raises OfflineModeIsEnabled. The exception bubbles up as a noisy traceback when a clean FileNotFoundError with operator guidance ("place the model under $LLAMAFARM_MODEL_DIR or pre-populate the HuggingFace cache") would be much more useful — and consistent with the rest of the package.

What changed

common/llamafarm_common/model_format.py
Added from . import offline_mode and a guard between the LLAMAFARM_MODEL_DIR check and the HfApi() construction:

if offline_mode.is_offline():
    raise FileNotFoundError(
        f"detect_model_format({base_model_id!r}) refused in offline mode "
        f"(LLAMAFARM_OFFLINE=1). Place the model under $LLAMAFARM_MODEL_DIR "
        f"or pre-populate the HuggingFace cache."
    )

This mirrors the existing list_gguf_files shape exactly.

runtimes/edge/routers/chat_completions/service.py
The existing handler at line 1562 catches huggingface_hub.errors.OfflineModeIsEnabled and converts it to a 404 model_not_cached. With the guard in place, that exception is no longer raised — FileNotFoundError is. Added a parallel isinstance(e, FileNotFoundError) and "LLAMAFARM_OFFLINE" in str(e) branch that produces the same 404 response, so API consumers see no contract change. The LLAMAFARM_OFFLINE substring check keeps this from accidentally swallowing other FileNotFoundErrors (e.g. a missing GGUF file in a non-offline scenario).

The universal runtime's chat handler does not have an analogous OfflineModeIsEnabled branch, so no change is needed there.

common/tests/test_model_format_offline.py (new)
6 tests covering:

  • Offline + cache miss + model-dir miss → FileNotFoundError with LLAMAFARM_OFFLINE in the message, and HfApi is never constructed.
  • Offline + local-cache hit → still resolves cleanly without touching the network.
  • Offline + LLAMAFARM_MODEL_DIR hit → still resolves cleanly.
  • Online (no env var) → network path is reached as before.
  • .gguf extension short-circuit works regardless of offline state.

Out of scope

  • The flight-safety status-propagation work (feat(edge): surface backend-init state on local/llm/status) is in fix(edge): surface backend-init state on local/llm/status #830. This PR is the complementary fix on the common side: even with the status propagation in place, suppressing the OfflineModeIsEnabled traceback noise is its own win for operators.
  • The HF-fallback path inside list_gguf_files and _binary.py are already correctly guarded — no change needed there.
  • The is_offline() env-var check in _binary.py (_is_offline() / get_lib_path()) is verified present and correct in v0.0.32. It is not a regression.

Test plan

  • cd common && uv run pytest tests/ — 135/135 passing (6 new tests in test_model_format_offline.py).
  • cd runtimes/universal && uv run pytest tests/test_model_format.py — 6/6 passing (existing tests unaffected).
  • cd runtimes/edge && uv run pytest tests/ — 23/23 passing.
  • Manual smoke: with LLAMAFARM_OFFLINE=1 and a model not present locally, hit /v1/chat/completions and confirm a clean 404 model_not_cached response with no OfflineModeIsEnabled traceback in the logs.
  • CI: full Linux/macOS/Windows matrix.

When LLAMAFARM_OFFLINE=1, detect_model_format previously fell through
to huggingface_hub.HfApi.list_repo_files if neither the local HF cache
nor LLAMAFARM_MODEL_DIR resolved the model. That produced a noisy
OfflineModeIsEnabled traceback from inside huggingface_hub instead of
the structured FileNotFoundError that list_gguf_files (in the same
package) already raises.

Add a matching guard before the HfApi() construction so callers see a
consistent offline-mode error format, and update the edge runtime's
chat-completions handler to map the new FileNotFoundError to the same
404 model_not_cached response that the OfflineModeIsEnabled branch
produced — preserving the API contract for clients.
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Guard detect_model_format HF fallback in offline mode

🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Add offline-mode guard to detect_model_format to prevent HuggingFace API calls
• Raise structured FileNotFoundError instead of OfflineModeIsEnabled traceback
• Map new FileNotFoundError to existing 404 response in edge runtime handler
• Add comprehensive offline-mode test suite for model format detection
Diagram
flowchart LR
  A["detect_model_format called<br/>with LLAMAFARM_OFFLINE=1"] --> B{"Check local cache<br/>or LLAMAFARM_MODEL_DIR"}
  B -->|Hit| C["Return format<br/>no API call"]
  B -->|Miss| D{"Offline guard<br/>is_offline()"}
  D -->|True| E["Raise FileNotFoundError<br/>with guidance"]
  D -->|False| F["Call HfApi<br/>list_repo_files"]
  F --> G["Return format"]
  E --> H["Edge runtime handler<br/>catches FileNotFoundError"]
  H --> I["Return 404<br/>model_not_cached"]
Loading

Grey Divider

File Changes

1. common/llamafarm_common/model_format.py 🐞 Bug fix +12/-0

Add offline guard to detect_model_format function

• Import offline_mode module for offline state checking
• Add offline guard before HfApi() construction that raises FileNotFoundError
• Guard mirrors existing pattern in list_gguf_files for consistency
• Error message provides actionable guidance for offline users

common/llamafarm_common/model_format.py


2. common/tests/test_model_format_offline.py 🧪 Tests +148/-0

Comprehensive offline-mode tests for detect_model_format

• New test file with 8 comprehensive offline-mode test cases
• Verify FileNotFoundError is raised without constructing HfApi in offline mode
• Test that error message contains helpful guidance about LLAMAFARM_MODEL_DIR
• Verify offline guard does not block local cache or model directory hits
• Confirm online mode still calls API as before
• Test .gguf extension short-circuit works in offline mode

common/tests/test_model_format_offline.py


3. runtimes/edge/routers/chat_completions/service.py 🐞 Bug fix +17/-0

Map offline FileNotFoundError to 404 response

• Add handler for FileNotFoundError with LLAMAFARM_OFFLINE in message
• Map new error to same 404 model_not_cached response as OfflineModeIsEnabled
• Preserve API contract for clients by returning identical HTTP response
• Use substring check to avoid accidentally catching unrelated FileNotFoundErrors

runtimes/edge/routers/chat_completions/service.py


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects Bot commented Apr 29, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Remediation recommended

1. Brittle offline error detection🐞 Bug ⚙ Maintainability
Description
The edge chat-completions handler identifies the new offline-mode failure by substring-matching the
FileNotFoundError message ("LLAMAFARM_OFFLINE" in str(e)), so a future message change in
detect_model_format will silently stop returning the intended 404 model_not_cached and fall through
to a 500.
Code

runtimes/edge/routers/chat_completions/service.py[R1579-1590]

+            if isinstance(e, FileNotFoundError) and "LLAMAFARM_OFFLINE" in str(e):
+                logger.warning(f"Model not cached locally (offline mode): {e}")
+                raise HTTPException(
+                    status_code=404,
+                    detail={
+                        "error": "model_not_cached",
+                        "message": (
+                            f"Model '{chat_request.model}' is not cached locally "
+                            "and offline mode is enabled"
+                        ),
+                    },
+                ) from e
Evidence
The handler’s behavior depends on a specific message token, and that token is currently produced by
detect_model_format’s newly added FileNotFoundError message. This couples API behavior to an
untyped, unstructured string that can change during refactors.

runtimes/edge/routers/chat_completions/service.py[1574-1590]
common/llamafarm_common/model_format.py[183-198]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Edge chat-completions maps offline missing-model errors to a 404 by checking whether a `FileNotFoundError` message contains the substring `LLAMAFARM_OFFLINE`. This is brittle because API behavior becomes dependent on an error-message string that may change.
### Issue Context
`detect_model_format()` now raises `FileNotFoundError` in offline mode with a message containing `LLAMAFARM_OFFLINE=1`, and the edge handler matches that string to decide whether to return `404 model_not_cached`.
### Fix Focus Areas
- common/llamafarm_common/model_format.py[183-198]
- runtimes/edge/routers/chat_completions/service.py[1574-1590]
### Suggested fix
- Introduce a dedicated exception type in `llamafarm_common` (e.g., `class OfflineModelNotCachedError(FileNotFoundError): ...`) and raise that from `detect_model_format()`.
- In the edge handler, replace the substring match with `isinstance(e, OfflineModelNotCachedError)` (or catch it explicitly).
- Keep the message content/operator guidance as-is, but make the programmatic detection rely on the exception type, not the message.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Offline 404 not consistent🐞 Bug ☼ Reliability
Description
With detect_model_format now raising FileNotFoundError in offline mode, edge endpoints that call
load_language without the chat-completions exception mapping (e.g., /v1/completions) will return an
unhandled 500 instead of a structured 404 model_not_cached for the same “model not cached while
offline” condition.
Code

common/llamafarm_common/model_format.py[R187-192]

+    if offline_mode.is_offline():
+        raise FileNotFoundError(
+            f"detect_model_format({base_model_id!r}) refused in offline mode "
+            f"(LLAMAFARM_OFFLINE=1). Place the model under $LLAMAFARM_MODEL_DIR "
+            f"or pre-populate the HuggingFace cache."
+        )
Evidence
detect_model_format now deterministically raises FileNotFoundError when offline and no local hit
exists. /v1/completions calls load_language directly with no try/except translating that
FileNotFoundError into a client-facing 404, while chat-completions does translate it.

common/llamafarm_common/model_format.py[183-198]
runtimes/edge/routers/completions.py[44-54]
runtimes/edge/routers/chat_completions/service.py[1574-1590]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`detect_model_format()` now raises a `FileNotFoundError` in strict offline mode when it would otherwise fall back to the HuggingFace API. Endpoints that call `load_language()` without translating this specific offline-mode failure will surface a generic 500 instead of the structured `404 model_not_cached` response used by chat-completions.
### Issue Context
- `/v1/chat/completions` maps this offline-mode error to a 404.
- `/v1/completions` calls `load_language()` directly and does not add similar mapping.
### Fix Focus Areas
- runtimes/edge/routers/completions.py[44-54]
- common/llamafarm_common/model_format.py[183-198]
### Suggested fix
Option A (preferred if you implement a structured exception type):
- Catch the new `OfflineModelNotCachedError` (see other finding) in `/v1/completions` and raise the same `HTTPException(status_code=404, detail={"error":"model_not_cached", ...})` shape.
Option B (minimal change):
- Add a try/except around `load_language(...)` in `/v1/completions` and map `FileNotFoundError` with the offline marker to `404 model_not_cached`, mirroring chat-completions.
Also consider adding a small test to pin this behavior for `/v1/completions` in offline mode.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 29, 2026

All E2E Tests Passed!

Test Results by Platform

OS Status
ubuntu-latest ✅ Passed
macos-latest ✅ Passed
windows-latest ✅ Passed

Summary


This comment was automatically generated by the E2E Tests workflow.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

@rachmlenig rachmlenig added the merge-when-approved Indicates that the author wants to merge this PR and is not planning to add further commits. label Apr 29, 2026
rachmlenig added 2 commits May 4, 2026 13:55
detect_model_format() raises FileNotFoundError in strict offline mode
when the requested model is not cached. /v1/chat/completions translates
this into a 404 model_not_cached response, but /v1/completions called
load_language() directly and surfaced a generic 500.

Mirror the chat-completions handling so API consumers see a stable
error contract regardless of which endpoint they hit.
…tion

Reviewer feedback: edge handlers were detecting the offline missing-model
case by checking whether a FileNotFoundError's message contained the
substring 'LLAMAFARM_OFFLINE'. That couples API behavior to an error
message string.

Introduce OfflineModelNotCachedError in llamafarm_common (subclass of
FileNotFoundError so existing callers still work). detect_model_format()
raises the typed exception, and the edge chat-completions and
completions handlers match on isinstance(...) instead of substring.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-when-approved Indicates that the author wants to merge this PR and is not planning to add further commits.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant