fix(common): guard detect_model_format HF fallback in offline mode#831
fix(common): guard detect_model_format HF fallback in offline mode#831rachmlenig wants to merge 3 commits intomainfrom
Conversation
When LLAMAFARM_OFFLINE=1, detect_model_format previously fell through to huggingface_hub.HfApi.list_repo_files if neither the local HF cache nor LLAMAFARM_MODEL_DIR resolved the model. That produced a noisy OfflineModeIsEnabled traceback from inside huggingface_hub instead of the structured FileNotFoundError that list_gguf_files (in the same package) already raises. Add a matching guard before the HfApi() construction so callers see a consistent offline-mode error format, and update the edge runtime's chat-completions handler to map the new FileNotFoundError to the same 404 model_not_cached response that the OfflineModeIsEnabled branch produced — preserving the API contract for clients.
Review Summary by QodoGuard detect_model_format HF fallback in offline mode
WalkthroughsDescription• Add offline-mode guard to detect_model_format to prevent HuggingFace API calls • Raise structured FileNotFoundError instead of OfflineModeIsEnabled traceback • Map new FileNotFoundError to existing 404 response in edge runtime handler • Add comprehensive offline-mode test suite for model format detection Diagramflowchart LR
A["detect_model_format called<br/>with LLAMAFARM_OFFLINE=1"] --> B{"Check local cache<br/>or LLAMAFARM_MODEL_DIR"}
B -->|Hit| C["Return format<br/>no API call"]
B -->|Miss| D{"Offline guard<br/>is_offline()"}
D -->|True| E["Raise FileNotFoundError<br/>with guidance"]
D -->|False| F["Call HfApi<br/>list_repo_files"]
F --> G["Return format"]
E --> H["Edge runtime handler<br/>catches FileNotFoundError"]
H --> I["Return 404<br/>model_not_cached"]
File Changes1. common/llamafarm_common/model_format.py
|
Code Review by Qodo
1.
|
✅ All E2E Tests Passed!Test Results by Platform
Summary
This comment was automatically generated by the E2E Tests workflow. |
detect_model_format() raises FileNotFoundError in strict offline mode when the requested model is not cached. /v1/chat/completions translates this into a 404 model_not_cached response, but /v1/completions called load_language() directly and surfaced a generic 500. Mirror the chat-completions handling so API consumers see a stable error contract regardless of which endpoint they hit.
…tion Reviewer feedback: edge handlers were detecting the offline missing-model case by checking whether a FileNotFoundError's message contained the substring 'LLAMAFARM_OFFLINE'. That couples API behavior to an error message string. Introduce OfflineModelNotCachedError in llamafarm_common (subclass of FileNotFoundError so existing callers still work). detect_model_format() raises the typed exception, and the edge chat-completions and completions handlers match on isinstance(...) instead of substring.
Summary
detect_model_formatinllamafarm_common.model_formatpreviously fell through tohuggingface_hub.HfApi.list_repo_fileseven whenLLAMAFARM_OFFLINE=1, producing anOfflineModeIsEnabledtraceback from deep insidehuggingface_hubinstead of the structuredFileNotFoundErrorthatlist_gguf_files(in the same package) already raises.list_gguf_filespattern.FileNotFoundErrorto the same404 model_not_cachedresponse that the existingOfflineModeIsEnabledbranch produces in the edge runtime's chat-completions handler, so the API contract is unchanged.Background — what was broken
common/llamafarm_common/model_format.py:detect_model_formatwalks four resolution tiers:model_id.endswith(".gguf")short-circuit — file extension only, no I/O._check_local_cache_for_model— scans the HuggingFace cache.resolve_from_model_dir— checks$LLAMAFARM_MODEL_DIR.HfApi().list_repo_files(...)— HTTP call to huggingface.co.Tier 4 had no offline guard.
list_gguf_filesin the same package (model_utils.py:419) already raises a structuredFileNotFoundErrorwhenLLAMAFARM_OFFLINE=1is set._binary.pyinllamafarm-llamadoes the same via_is_offline()/get_lib_path()(_binary.py:323,387).detect_model_formatwas the outlier — likely missed during the refactor train (3eed3558 fix(common): remove filesystem probing from detect_model_format entirely,e29ecde9 fix(common): check LLAMAFARM_MODEL_DIR before HuggingFace API in detect_model_format).Verified the gap exists in the released v0.0.32 via
git show v0.0.32:common/llamafarm_common/model_format.py—list_repo_filesis called unconditionally on line 185.When the offline guard is missing, the call falls into
huggingface_hubwhich seesHF_HUB_OFFLINE=1(propagated fromLLAMAFARM_OFFLINEby our bootstrap) and raisesOfflineModeIsEnabled. The exception bubbles up as a noisy traceback when a cleanFileNotFoundErrorwith operator guidance ("place the model under$LLAMAFARM_MODEL_DIRor pre-populate the HuggingFace cache") would be much more useful — and consistent with the rest of the package.What changed
common/llamafarm_common/model_format.pyAdded
from . import offline_modeand a guard between theLLAMAFARM_MODEL_DIRcheck and theHfApi()construction:This mirrors the existing
list_gguf_filesshape exactly.runtimes/edge/routers/chat_completions/service.pyThe existing handler at line 1562 catches
huggingface_hub.errors.OfflineModeIsEnabledand converts it to a404 model_not_cached. With the guard in place, that exception is no longer raised —FileNotFoundErroris. Added a parallelisinstance(e, FileNotFoundError) and "LLAMAFARM_OFFLINE" in str(e)branch that produces the same 404 response, so API consumers see no contract change. TheLLAMAFARM_OFFLINEsubstring check keeps this from accidentally swallowing otherFileNotFoundErrors (e.g. a missing GGUF file in a non-offline scenario).The universal runtime's chat handler does not have an analogous
OfflineModeIsEnabledbranch, so no change is needed there.common/tests/test_model_format_offline.py(new)6 tests covering:
FileNotFoundErrorwithLLAMAFARM_OFFLINEin the message, andHfApiis never constructed.LLAMAFARM_MODEL_DIRhit → still resolves cleanly..ggufextension short-circuit works regardless of offline state.Out of scope
feat(edge): surface backend-init state on local/llm/status) is in fix(edge): surface backend-init state on local/llm/status #830. This PR is the complementary fix on thecommonside: even with the status propagation in place, suppressing theOfflineModeIsEnabledtraceback noise is its own win for operators.list_gguf_filesand_binary.pyare already correctly guarded — no change needed there.is_offline()env-var check in_binary.py(_is_offline()/get_lib_path()) is verified present and correct in v0.0.32. It is not a regression.Test plan
cd common && uv run pytest tests/— 135/135 passing (6 new tests intest_model_format_offline.py).cd runtimes/universal && uv run pytest tests/test_model_format.py— 6/6 passing (existing tests unaffected).cd runtimes/edge && uv run pytest tests/— 23/23 passing.LLAMAFARM_OFFLINE=1and a model not present locally, hit/v1/chat/completionsand confirm a clean404 model_not_cachedresponse with noOfflineModeIsEnabledtraceback in the logs.