Skip to content

✨ feat: validate model availability before run execution#416

Open
Marco Russo (marcorusso97) wants to merge 3 commits into
mainfrom
377-validate-model-availability-before-run-execution
Open

✨ feat: validate model availability before run execution#416
Marco Russo (marcorusso97) wants to merge 3 commits into
mainfrom
377-validate-model-availability-before-run-execution

Conversation

@marcorusso97
Copy link
Copy Markdown
Contributor

Summary

This PR introduces a model availability preflight in the attack orchestrator, so runs are aborted before execution starts when required model endpoints are unreachable.

What changed

  • Added pre-run model availability validation before Attack/Run DB records are created.
  • Added per-attack role mapping to discover all required model roles (target plus attack-specific roles).
  • Added robust attack type normalization for preflight role resolution (including alias handling such as AutoDANTurbo -> autodan_turbo).
  • Added live preflight progress output for each role:
    • Checking () ... OK/KO
  • Added internal noise suppression during probes:
    • temporarily silences internal logs and stdout/stderr emitted by provider libraries during health probes.
  • Added aggregated, user-friendly error formatting for unreachable models:
    • Unreachable models:
      • role=... identifier=... endpoint=... error=...
  • Updated failure behavior:
    • on preflight failure, log a configuration error and gracefully stop the run early, instead of proceeding.
    • run startup is blocked and no Attack/Run records are created in this case.

How the healthcheck works

  1. Prepare attack parameters and resolve goals.
  2. Build a list of required targets:
    • always include target model from the existing router.
    • include attack-specific roles from the role-path map (for example attacker/scorer/summarizer/embedder, judge variants, decorator role, etc.).
    • include category_classifier unless explicit intent taxonomy labels are already provided.
  3. For each target, run a lightweight probe:
    • for existing target: use the already registered router.
    • for configured role models: create a temporary router from role config.
    • issue a minimal request:
      • one user message: healthcheck
      • max_tokens=1
      • temperature=0.0
  4. Probe result handling:
    • if router initialization fails or request raises, mark KO with the captured error.
    • if response has error_message, mark KO.
    • if response is non-dict, treat as inconclusive-pass (to avoid false negatives with custom adapters/tests).
  5. Print per-role progress and final status (OK/KO).
  6. If any target is unreachable:
    • build one aggregated multiline error report listing role, identifier, endpoint, and error.
    • abort before creating Attack/Run records.
  7. If all checks pass, continue with normal run creation and execution.

Tests

Extended orchestrator tests now cover:

  • unreachable model message content and formatting.
  • multi-model aggregation in a single preflight failure report.
  • no Attack/Run DB creation when preflight fails.
  • attack-type normalization regression coverage for AutoDAN aliases.

Why this is useful

  • Prevents expensive or noisy runs when dependencies are misconfigured.
  • Gives immediate, actionable feedback on exactly which model endpoint is failing.
  • Improves reliability and UX with clear preflight visibility and graceful early abort.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 3, 2026

Codecov Report

❌ Patch coverage is 81.15942% with 39 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
hackagent/attacks/orchestrator.py 80.97% 39 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Validate model availability before run execution

1 participant