FEAT: Runtime capability discovery for prompt targets#1699
FEAT: Runtime capability discovery for prompt targets#1699hannahwestra25 wants to merge 21 commits into
Conversation
…lities in verify_target_async
| ) | ||
|
|
||
| # Probe a single dimension: | ||
| verified_caps = await query_target_capabilities_async(target=target) |
There was a problem hiding this comment.
It would be more intuitive IMO if it was target.get_capabilities() or (even better) target.capabilities (and similarly target.input_modalities / target.output_modalities) since these are static after instantiation (right?).
Having to import a function makes it a bit more obscure.
There was a problem hiding this comment.
hmm i get your point about importing but the idea is to query the capabilities if they are not known and I think target.get_capabilities and target.capabilties doesn't convey that we're making api calls (and there's potentially a couple api calls if we're querying for multiple capabilities.
Also, this query is about inspecting a target vs performing a responsibility that a target actually does so i think putting this responsibility in the target class bloats the class and confuses what the target actually does (a target doesn't query itself, it sends prompts) and then I would be concerned that users would think the query is a getter of the already declared capabilities vs actually making the api calls.
(also if it was unclear to you that this function was making api calls to determine the capabilities, I could change the naming of the function. something like discover_target_capabilities_async might be better?)
…ra/query_target_capabilities
| ) | ||
|
|
||
|
|
||
| async def _send_and_check_async( |
There was a problem hiding this comment.
should we add a backoff here?
There was a problem hiding this comment.
good question! I don’t think backoff is needed here since these probes are trying to distinguish supported vs unsupported and most failures are deterministic rather than transient, so waiting longer usually does not change the answer and could just add more time when you retry on a capability that isn't supported. I think a single immediate retry is enough to cover brief network noise or a one-off timeout without making capability detection noticeably slower. wdyt ?
| original_value='Respond with a JSON object: {"ok": true}.', | ||
| original_value_data_type="text", | ||
| conversation_id=conversation_id, | ||
| prompt_metadata=_probe_metadata({"response_format": "json"}), |
There was a problem hiding this comment.
I understand that the point of this probe is not to actually see if the capabilities actually take effect in the response, but it seems like for _probe_json_output_async and _probe_json_schema_async, the metadata is only ever parsed for Responses and Chat targets and silently ignored by other PromptTarget subclasses if I'm understanding correctly, which I feel like is different from being silently ignored by the endpoint at inference time (for other probes). Would we be able to check if the returned value is json formatted? or maybe more clear in the docstring at least that it only applies to specific targets.
There was a problem hiding this comment.
good point! I updated the docstring to try to convey this restriction, but yes that is the limitation here. For some targets the JSON hint is converted into a real provider parameter; for others it is only metadata on the PyRIT side and never becomes a structured-output request. Parsing the returned text as JSON would test output compliance, but not native JSON-mode support. lmk if that isn't clear in the comments!
| CapabilityName.MULTI_TURN: UnsupportedCapabilityBehavior.RAISE, | ||
| CapabilityName.SYSTEM_PROMPT: UnsupportedCapabilityBehavior.RAISE, |
There was a problem hiding this comment.
q: why only RAISE on these two?
There was a problem hiding this comment.
these are the only two capabilities which can be adapted
and we don't want to allow that because we just want to know what is supported and not adapted
Description
Adds query_target_capabilities.py, which probes a
PromptTargetat runtime to determine what the underlying endpoint actually accepts. Useful for custom OpenAI-compatible endpoints, gateways that strip features, or any deployment where declared capabilities may not match real behavior.New public API (exported from
pyrit.prompt_target)query_target_capabilities_async— probes boolean capability flags (SYSTEM_PROMPT,MULTI_MESSAGE_PIECES,MULTI_TURN,JSON_OUTPUT,JSON_SCHEMA).verify_target_modalities_async— probes which input-modality combinations are accepted.verify_target_async— runs both and returns a populatedTargetCapabilities.Each probe is bounded by
per_probe_timeout_s(default 30s) and retried once on transient errors. The target's configuration is temporarily replaced with a permissive one so_validate_requestdoesn't short-circuit. Probe-written memory rows are tagged withprompt_metadata["capability_probe"] == "1".Caveats (in docstrings)
Tests & docs