feat(DATAGO-130042): synthetic-monitor auth path for Datadog smoke tests#1461
Conversation
Adds a config-gated, non-interactive auth path so Datadog Synthetics (and any future machine client) can authenticate against the gateway without driving the SSO UI. Used by an upcoming end-to-end smoke test that sends a chat prompt and asserts a non-empty LLM response. How it works: - Caller mints an Entra ID JWT via OAuth client_credentials with a dedicated app role (no user identity in the token). - New synthetic.py validates that JWT locally against the tenant JWKS and maps it to a fixed synthetic principal (user_id = "synthetic- monitor", non-routable email, is_synthetic flag) so downstream code that expects a current_user keeps working. - Middleware runs the synthetic branch ahead of sam_access_token and IdP paths. Synthetic-shaped tokens that fail validation are hard- rejected (401) rather than falling through, so an attacker can't probe the validator by varying claims. Security controls (each independently load-bearing): - RS256-only; alg=none and HS variants refused by PyJWT. - Standard JWT validation: signature, iss, aud, tid, exp, nbf, iat with 60s skew. - Strict roles == [configured-role] (no privilege creep via extras). - appid/azp must be in a configured allowlist of Datadog SP GUIDs. - Any user-identity claim (sub/oid/preferred_username/upn/unique_name) hard-rejects the token. - Endpoint allowlist (method + path regex) enforced at the auth layer before any handler runs. Default deny. - Config fail-closed: missing required field, empty appid allowlist, or empty endpoint allowlist all disable the path entirely. Disabled by default. Enabled via component config: synthetic_auth_enabled synthetic_auth_tenant_id synthetic_auth_audience synthetic_auth_role_name synthetic_auth_appid_allowlist synthetic_auth_endpoint_allowlist Tests cover discriminator boundary (NotApplicable vs Invalid), every rejection path, happy path, config fail-closed cases, and endpoint allowlist behaviour including method mismatch and path-extension attempts.
✅ FOSSA Guard: Licensing (
|
✅ FOSSA Guard: Vulnerability (
|
| issuer = f"https://login.microsoftonline.com/{tenant_id}/v2.0" | ||
| jwks_uri = f"https://login.microsoftonline.com/{tenant_id}/discovery/v2.0/keys" |
There was a problem hiding this comment.
Do you think we could allow other OIDC providers too, similar to how it is done here?
https://github.com/SolaceDev/solace-agent-mesh-enterprise/blob/main/examples/oauth2_config.yaml#L44
There was a problem hiding this comment.
Fair point - But, Going generic across OIDC providers (Okta/Auth0/etc.) is a bigger change than the URL config though — claim names and semantics differ across IdPs. The oauth2_config.yaml analogy works for user-OIDC because sub/email/name are pretty standard, but client-credentials tokens are much less standardized. If a second IdP becomes a real requirement we can add a per-provider abstraction at that point — likely cleaner than parameterizing what's currently here.
For now this is solely for our internal Datadog Synthetic test against solace-chat, so I'd rather not preemptively build the generic shape.
The first real client_credentials token from Entra revealed two
over-strict checks that would reject every legitimately-issued token:
- Entra defaults to v1 issuer (sts.windows.net) for client_credentials
unless the resource opts into v2 via accessTokenAcceptedVersion=2.
Accept both v1 and v2 issuers for the configured tenant.
- Entra includes sub and oid in every app-only token to identify the
service principal; they are not user identity. Drop them from the
user-claim rejection list. preferred_username, upn, and unique_name
remain — those only appear in user-context tokens.
Add appidacr ("Application Authentication Context Reference") as a
compensating control: must be "1" (client secret) or "2" (certificate).
Rejects "0" (public client, no auth) and missing-claim cases.
Tests: 29 → 31. New cases cover v1-issuer acceptance, sub/oid
acceptance, appidacr="0" rejection, and missing-appidacr rejection.
…peek SonarQube's python:S5659 rightly flags any unverified jwt.decode call. This particular call is a routing-only discriminator: it decides whether to route the token through the synthetic auth branch vs fall through to IdP/sam_access_token, before we know if Entra-JWKS signature verification will succeed. The peek is never used for any auth decision — all auth decisions (appid allowlist, role strict match, claim-absence checks, audience, issuer, tenant, expiry) use the verified claims obtained from a second jwt.decode call below that does verify the signature against the JWKS. A forged token that passes the routing peek is hard-rejected at the signature verification step. Strengthen the inline comment to make the intent and safety bound unambiguous, and add NOSONAR(python:S5659) so reviewers see the suppression is deliberate.
Address review feedback (#1461 r3163509636). The issuer/JWKS URLs are no longer hard-tied to the Entra public cloud — two new optional config keys override the auto-derived defaults: - synthetic_auth_issuers: list of acceptable issuer strings - synthetic_auth_jwks_uri: JWKS URL for signature verification Defaults still derive from synthetic_auth_tenant_id and target the public Entra cloud (sts.windows.net + login.microsoftonline.com), so existing setups need no changes. Useful for: - Entra Government (login.microsoftonline.us) - Entra China (login.partner.microsoftonline.cn) - Any future Entra-claim-shaped IdP at a non-default URL Validation logic remains Entra-claim-shaped (appid, appidacr, roles), so this is not a generalization to arbitrary OIDC providers — that would require configurable claim names and per-provider validation, a meaningfully larger change. Documented this scope in the PR thread. Tests +2: default-issuers-target-entra-public-cloud, issuer-and-jwks-overrides-apply.
Make synthetic-auth config-loading more diagnosable when staging deploys
mis-set env vars:
- Bumped startup ENABLED log to include issuer count and a clearer name.
- Added a startup WARNING when the path is NOT enabled, pointing at the
config keys to check. Previously a misconfigured synthetic_auth_enabled
silently dropped traffic to the IdP path with no signal.
- Hardened SyntheticAuthConfig parsing: env-var-substituted YAML can
deliver list/dict-shaped fields as JSON strings; _coerce_to_list now
parses those before validation so configs like
synthetic_auth_appid_allowlist: '${...}' work without a manual !!seq.
The synthetic principal authenticated cleanly but every downstream authorization check 403'd because AuthorizationService.get_scopes_for_user fell back to MS Graph for role lookup. MS Graph 404s on the synthetic-monitor identity since it isn't a real Entra user. Per AuthorizationService.get_scopes_for_user (in solace-agent-mesh- enterprise), if `roles` is non-None on user_state, it uses those directly and skips role-provider lookup entirely. Same mechanism sam_access_token already uses to bypass MS Graph. Adds a new required config key `synthetic_auth_roles` (list of role names defined in role-to-scope-definitions.yaml). build_synthetic_user_state now sets `roles` on the returned state. Fail-closed: empty roles list keeps the path disabled to avoid silently re-introducing the MS Graph fallback. Tests: 33 → 35. New: empty-roles disables, user state carries roles.
…d claims _extract_initial_claims was building a fresh dict from request.state.user that included only id/name/email/user_info — dropping any 'roles' key that auth middleware had pre-populated. Downstream code (submit_a2a_task → resolve_user_config → _get_user_state_roles) checks for roles at the top level of user_identity, not inside user_info, so the synthetic principal's pre-resolved roles never reached AuthorizationService.get_scopes_for_user. Effect was that even with synthetic_auth_roles=["SyntheticMonitor"] correctly set on user_state, the downstream scope check fell back to MS Graph role lookup, 404'd on the synthetic identity, and denied agent access. Forward roles to top-level claims in both return paths so any auth method that pre-populates roles (sam_access_token already does this, synthetic now does too) propagates them to the agent-access check.
Make the trust boundary on the role-forwarding path explicit. Today
user_state["roles"] is only populated by two server-trusted code paths
(synthetic auth pulls from YAML config; sam_access_token pulls from a
signed internal token verified by trust_manager), so the unconditional
forward in _extract_initial_claims is safe in practice. But the check
was an "if 'roles' is present" pattern, which would silently start
forwarding user-controlled JWT roles the moment any future auth path
populated state.user["roles"] from request claims.
Tighten by allowlisting auth_method ∈ {synthetic, sam_access_token}.
Other paths fall through to the existing role-provider lookup. No
functional change for current callers.
|




What is the purpose of this change?
Adds a config-gated, non-interactive auth path so Datadog Synthetics (and any future machine client) can authenticate against the gateway without driving the SSO UI. Enables an upcoming end-to-end smoke test that sends a chat prompt and asserts a non-empty LLM response.
solace-chat today only supports OIDC Authorization Code flow against Entra ID — every authenticated caller must go through the user login UI. That doesn't work for synthetic monitoring.
How was this change implemented?
New module
src/solace_agent_mesh/shared/auth/synthetic.py:PyJWKClient(existing dependency).SyntheticTokenNotApplicable, caller falls through to other auth paths) from "synthetic token but invalid" (SyntheticTokenInvalid, caller hard-rejects). This boundary prevents an attacker from probing the validator by varying claims.Modified
src/solace_agent_mesh/shared/auth/middleware.py:_try_synthetic_authruns ahead of the existingsam_access_tokenand IdP branches.user_id = "synthetic-monitor", non-routable email,is_syntheticflag) so downstream code that expects acurrent_userkeeps working unchanged.Key Design Decisions
roles == [config.role_name]) rather thancontains. Prevents privilege creep if a future role is granted alongside.synthetic-monitor@synthetics.invalid) — can't collide with a real user, can't be probed via account enumeration.is_syntheticschema flag and conversation cleanup are deferred as stretch goals. Synthetic conversations are identified by the well-knownuser_id = "synthetic-monitor"; cleanup can be added later without touching auth.How was this change tested?
tests/unit/shared/auth/test_synthetic.py(29 tests, all passing in 0.44s)Is there anything the reviewers should focus on/be aware of?
synthetic.py: each defense-in-depth check (signature, alg allowlist, iss, aud, tid, strict roles, appid allowlist, no-user-claims, fail-closed config) is independently load-bearing. Removing any one check creates a real vulnerability.solace-agent-meshpackage, so the synthetic path is technically available to any deployment that opts in via config — not just solace-chat. Default-off behaviour means this is safe, but worth flagging.