Skip to content

✨ feat: allow different judge models for same judge type and show stats in dashboard#420

Open
Marco Russo (marcorusso97) wants to merge 1 commit into
mainfrom
414-allow-usage-of-multiple-judges-of-same-type
Open

✨ feat: allow different judge models for same judge type and show stats in dashboard#420
Marco Russo (marcorusso97) wants to merge 1 commit into
mainfrom
414-allow-usage-of-multiple-judges-of-same-type

Conversation

@marcorusso97
Copy link
Copy Markdown
Contributor

@marcorusso97 Marco Russo (marcorusso97) commented Jun 4, 2026

Summary

This PR introduces full support for running multiple judges of the same type with different models, and ensures their outputs are correctly tracked, aggregated, and rendered across the dashboard.

It also fixes consistency issues between summary panels and expanded detail views, so judge counts, names, metrics, and verdicts stay aligned.

Why

When two or more judges shared the same type, judge vote keys could collide and overwrite each other.
This caused missing judges, incorrect counts, incomplete strictness/ASR values, and absent verdict blocks in detail cards.

What Changed

Multi-judge key stability

  • Added deterministic suffixing for duplicate judge types.
  • Preserved distinct per-judge votes using canonical keys such as:
    • eval_hb
    • eval_hbv_1
    • eval_hbv_2

Evaluation and metrics pipeline

  • Updated evaluation handling to avoid overwriting votes from repeated judge types.
  • Improved aggregation and sync logic to preserve per-judge outputs end to end.

Dashboard enrichment and rendering

  • Standardized propagation of:
    • judge votes
    • judge metadata (name/type)
    • per-goal multi-judge metrics
  • Added robust fallbacks for legacy runs and sparse trace payloads.
  • Unified multi-judge verdict styling and behavior across attack cards.

Attack card updates

  • Improved verdict rendering in:
    • AdvPrefix
    • PAP
    • Baseline
    • BoN
    • Generic card paths used by FlipAttack, CipherChat, and H4RM3L
  • Does not apply to scorer-based attacks (AutoDan-Turbo, PAIR, TAP).
  • Ensured verdicts also appear in mitigated scenarios when judge votes are available.

Tests

  • Updated unit tests for:
    • evaluation step
    • sync behavior
    • metrics behavior
  • Added coverage for repeated judge-type scenarios and key-collision prevention.

Impact

  • Backward compatible for existing runs.
  • Eliminates duplicate-judge key collisions.
  • Improves reliability and transparency of multi-judge analytics in the dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow usage of multiple judges of same type

1 participant