feat(DATAGO-130042): synthetic-monitor auth path for Datadog smoke tests by JKaram · Pull Request #1461 · SolaceLabs/solace-agent-mesh

JKaram · 2026-04-27T15:58:03Z

What is the purpose of this change?

Adds a config-gated, non-interactive auth path so Datadog Synthetics (and any future machine client) can authenticate against the gateway without driving the SSO UI. Enables an upcoming end-to-end smoke test that sends a chat prompt and asserts a non-empty LLM response.

solace-chat today only supports OIDC Authorization Code flow against Entra ID — every authenticated caller must go through the user login UI. That doesn't work for synthetic monitoring.

How was this change implemented?

New module src/solace_agent_mesh/shared/auth/synthetic.py:

Loads Datadog client app GUIDs, role name, audience, tenant ID, and an endpoint allowlist from component config. Fail-closed (any missing required field, empty appid allowlist, or empty endpoint allowlist disables the path entirely).
Validates Entra-issued client-credentials JWTs locally via PyJWKClient (existing dependency).
Distinguishes "not a synthetic token" (SyntheticTokenNotApplicable, caller falls through to other auth paths) from "synthetic token but invalid" (SyntheticTokenInvalid, caller hard-rejects). This boundary prevents an attacker from probing the validator by varying claims.

Modified src/solace_agent_mesh/shared/auth/middleware.py:

Loads synthetic config once at startup; logs a single info line when enabled.
New _try_synthetic_auth runs ahead of the existing sam_access_token and IdP branches.
Maps a valid synthetic token to a fixed principal (user_id = "synthetic-monitor", non-routable email, is_synthetic flag) so downstream code that expects a current_user keeps working unchanged.
Endpoint allowlist (method + path regex) enforced at the auth layer before any handler runs. Default deny.

Key Design Decisions

Local JWKS validation rather than delegating to the external auth service. Synthetic auth is security-sensitive (strict role match, appid allowlist, mixed-claim rejection) and co-locating validation with principal construction and endpoint enforcement makes it easier to audit.
Hard reject (don't fall through) on synthetic-shaped invalid tokens. A token claiming to be synthetic that fails validation is suspicious — falling through to other auth paths would let an attacker probe by varying claims.
Strict role equality (roles == [config.role_name]) rather than contains. Prevents privilege creep if a future role is granted alongside.
Non-routable synthetic email (synthetic-monitor@synthetics.invalid) — can't collide with a real user, can't be probed via account enumeration.
Fail-closed config loader — refuses to enable with empty allowlists. An empty appid allowlist would otherwise accept any tenant app with the role; an empty endpoint allowlist would deny everything.
is_synthetic schema flag and conversation cleanup are deferred as stretch goals. Synthetic conversations are identified by the well-known user_id = "synthetic-monitor"; cleanup can be added later without touching auth.

How was this change tested?

Unit tests: tests/unit/shared/auth/test_synthetic.py (29 tests, all passing in 0.44s)
- Discriminator boundary (NotApplicable vs Invalid)
- Every rejection path: alg=none, wrong audience/issuer/tenant, appid not allowlisted, extra roles, missing role, expired token, each user-identity claim parameterized
- Happy path returns expected claims
- Config fail-closed cases (disabled-by-default, missing field, empty allowlists)
- Endpoint allowlist incl. method mismatch and path-extension attempts
Tests use a real RSA keypair generated per session — PyJWT signature/claim validation is actually exercised, not mocked
Integration tests against a running gateway: not yet — gated on Phase 2 (Entra setup)
Known limitation: rate limiting for the synthetic principal isn't implemented in this PR (lives elsewhere in the stack); needs a separate decision

Is there anything the reviewers should focus on/be aware of?

Security review of synthetic.py: each defense-in-depth check (signature, alg allowlist, iss, aud, tid, strict roles, appid allowlist, no-user-claims, fail-closed config) is independently load-bearing. Removing any one check creates a real vulnerability.
Ordering in middleware: the synthetic branch runs first when enabled; it must remain that way so synthetic tokens don't accidentally take the IdP fallback path.

Disabled by default. No behaviour change for existing deployments. Each environment (staging, prod) opts in via config:

synthetic_auth_enabled: true
synthetic_auth_tenant_id: <entra-tenant-guid>
synthetic_auth_audience: api://<solace-chat-app-id-uri>
synthetic_auth_role_name: Synthetics.Smoke
synthetic_auth_appid_allowlist: [<datadog-sp-appid-guid>]
synthetic_auth_endpoint_allowlist:
  - { method: GET, path: "^/api/v1/sessions$" }
  - { method: POST, path: "^/api/v1/messages$" }
  # plus whatever the chat smoke actually needs

Endpoint allowlist contents are placeholders — the real list depends on what the chat send-message + response-stream endpoints are exactly. Will be tightened during Phase 2 of the rollout once the smoke test is wired up.
The middleware lives in the shared solace-agent-mesh package, so the synthetic path is technically available to any deployment that opts in via config — not just solace-chat. Default-off behaviour means this is safe, but worth flagging.

Adds a config-gated, non-interactive auth path so Datadog Synthetics (and any future machine client) can authenticate against the gateway without driving the SSO UI. Used by an upcoming end-to-end smoke test that sends a chat prompt and asserts a non-empty LLM response. How it works: - Caller mints an Entra ID JWT via OAuth client_credentials with a dedicated app role (no user identity in the token). - New synthetic.py validates that JWT locally against the tenant JWKS and maps it to a fixed synthetic principal (user_id = "synthetic- monitor", non-routable email, is_synthetic flag) so downstream code that expects a current_user keeps working. - Middleware runs the synthetic branch ahead of sam_access_token and IdP paths. Synthetic-shaped tokens that fail validation are hard- rejected (401) rather than falling through, so an attacker can't probe the validator by varying claims. Security controls (each independently load-bearing): - RS256-only; alg=none and HS variants refused by PyJWT. - Standard JWT validation: signature, iss, aud, tid, exp, nbf, iat with 60s skew. - Strict roles == [configured-role] (no privilege creep via extras). - appid/azp must be in a configured allowlist of Datadog SP GUIDs. - Any user-identity claim (sub/oid/preferred_username/upn/unique_name) hard-rejects the token. - Endpoint allowlist (method + path regex) enforced at the auth layer before any handler runs. Default deny. - Config fail-closed: missing required field, empty appid allowlist, or empty endpoint allowlist all disable the path entirely. Disabled by default. Enabled via component config: synthetic_auth_enabled synthetic_auth_tenant_id synthetic_auth_audience synthetic_auth_role_name synthetic_auth_appid_allowlist synthetic_auth_endpoint_allowlist Tests cover discriminator boundary (NotApplicable vs Invalid), every rejection path, happy path, config fail-closed cases, and endpoint allowlist behaviour including method mismatch and path-extension attempts.

github-actions · 2026-04-27T16:00:48Z

✅ FOSSA Guard: Licensing (`SolaceLabs_solace-agent-mesh`) • PASSED

Compared against main (9b6a9c31cb6e44135812ad01cc06c612c0c8bd23) • 0 new, 9 total (9 in base)

Scan Report | View Details in FOSSA

github-actions · 2026-04-27T16:00:58Z

✅ FOSSA Guard: Vulnerability (`SolaceLabs_solace-agent-mesh`) • PASSED

Compared against main (9b6a9c31cb6e44135812ad01cc06c612c0c8bd23) • 0 new, 14 total (14 in base)

Scan Report | View Details in FOSSA

Hugo-Pare · 2026-04-29T19:06:04Z

+        issuer = f"https://login.microsoftonline.com/{tenant_id}/v2.0"
+        jwks_uri = f"https://login.microsoftonline.com/{tenant_id}/discovery/v2.0/keys"


Do you think we could allow other OIDC providers too, similar to how it is done here?
https://github.com/SolaceDev/solace-agent-mesh-enterprise/blob/main/examples/oauth2_config.yaml#L44

Fair point - But, Going generic across OIDC providers (Okta/Auth0/etc.) is a bigger change than the URL config though — claim names and semantics differ across IdPs. The oauth2_config.yaml analogy works for user-OIDC because sub/email/name are pretty standard, but client-credentials tokens are much less standardized. If a second IdP becomes a real requirement we can add a per-provider abstraction at that point — likely cleaner than parameterizing what's currently here.

For now this is solely for our internal Datadog Synthetic test against solace-chat, so I'd rather not preemptively build the generic shape.

The first real client_credentials token from Entra revealed two over-strict checks that would reject every legitimately-issued token: - Entra defaults to v1 issuer (sts.windows.net) for client_credentials unless the resource opts into v2 via accessTokenAcceptedVersion=2. Accept both v1 and v2 issuers for the configured tenant. - Entra includes sub and oid in every app-only token to identify the service principal; they are not user identity. Drop them from the user-claim rejection list. preferred_username, upn, and unique_name remain — those only appear in user-context tokens. Add appidacr ("Application Authentication Context Reference") as a compensating control: must be "1" (client secret) or "2" (certificate). Rejects "0" (public client, no auth) and missing-claim cases. Tests: 29 → 31. New cases cover v1-issuer acceptance, sub/oid acceptance, appidacr="0" rejection, and missing-appidacr rejection.

…peek SonarQube's python:S5659 rightly flags any unverified jwt.decode call. This particular call is a routing-only discriminator: it decides whether to route the token through the synthetic auth branch vs fall through to IdP/sam_access_token, before we know if Entra-JWKS signature verification will succeed. The peek is never used for any auth decision — all auth decisions (appid allowlist, role strict match, claim-absence checks, audience, issuer, tenant, expiry) use the verified claims obtained from a second jwt.decode call below that does verify the signature against the JWKS. A forged token that passes the routing peek is hard-rejected at the signature verification step. Strengthen the inline comment to make the intent and safety bound unambiguous, and add NOSONAR(python:S5659) so reviewers see the suppression is deliberate.

Address review feedback (#1461 r3163509636). The issuer/JWKS URLs are no longer hard-tied to the Entra public cloud — two new optional config keys override the auto-derived defaults: - synthetic_auth_issuers: list of acceptable issuer strings - synthetic_auth_jwks_uri: JWKS URL for signature verification Defaults still derive from synthetic_auth_tenant_id and target the public Entra cloud (sts.windows.net + login.microsoftonline.com), so existing setups need no changes. Useful for: - Entra Government (login.microsoftonline.us) - Entra China (login.partner.microsoftonline.cn) - Any future Entra-claim-shaped IdP at a non-default URL Validation logic remains Entra-claim-shaped (appid, appidacr, roles), so this is not a generalization to arbitrary OIDC providers — that would require configurable claim names and per-provider validation, a meaningfully larger change. Documented this scope in the PR thread. Tests +2: default-issuers-target-entra-public-cloud, issuer-and-jwks-overrides-apply.

Make synthetic-auth config-loading more diagnosable when staging deploys mis-set env vars: - Bumped startup ENABLED log to include issuer count and a clearer name. - Added a startup WARNING when the path is NOT enabled, pointing at the config keys to check. Previously a misconfigured synthetic_auth_enabled silently dropped traffic to the IdP path with no signal. - Hardened SyntheticAuthConfig parsing: env-var-substituted YAML can deliver list/dict-shaped fields as JSON strings; _coerce_to_list now parses those before validation so configs like synthetic_auth_appid_allowlist: '${...}' work without a manual !!seq.

The synthetic principal authenticated cleanly but every downstream authorization check 403'd because AuthorizationService.get_scopes_for_user fell back to MS Graph for role lookup. MS Graph 404s on the synthetic-monitor identity since it isn't a real Entra user. Per AuthorizationService.get_scopes_for_user (in solace-agent-mesh- enterprise), if `roles` is non-None on user_state, it uses those directly and skips role-provider lookup entirely. Same mechanism sam_access_token already uses to bypass MS Graph. Adds a new required config key `synthetic_auth_roles` (list of role names defined in role-to-scope-definitions.yaml). build_synthetic_user_state now sets `roles` on the returned state. Fail-closed: empty roles list keeps the path disabled to avoid silently re-introducing the MS Graph fallback. Tests: 33 → 35. New: empty-roles disables, user state carries roles.

…d claims _extract_initial_claims was building a fresh dict from request.state.user that included only id/name/email/user_info — dropping any 'roles' key that auth middleware had pre-populated. Downstream code (submit_a2a_task → resolve_user_config → _get_user_state_roles) checks for roles at the top level of user_identity, not inside user_info, so the synthetic principal's pre-resolved roles never reached AuthorizationService.get_scopes_for_user. Effect was that even with synthetic_auth_roles=["SyntheticMonitor"] correctly set on user_state, the downstream scope check fell back to MS Graph role lookup, 404'd on the synthetic identity, and denied agent access. Forward roles to top-level claims in both return paths so any auth method that pre-populates roles (sam_access_token already does this, synthetic now does too) propagates them to the agent-access check.

Make the trust boundary on the role-forwarding path explicit. Today user_state["roles"] is only populated by two server-trusted code paths (synthetic auth pulls from YAML config; sam_access_token pulls from a signed internal token verified by trust_manager), so the unconditional forward in _extract_initial_claims is safe in practice. But the check was an "if 'roles' is present" pattern, which would silently start forwarding user-controlled JWT roles the moment any future auth path populated state.user["roles"] from request claims. Tighten by allowlisting auth_method ∈ {synthetic, sam_access_token}. Other paths fall through to the existing role-provider lookup. No functional change for current callers.

sonarqube-solacecloud · 2026-05-06T20:59:50Z

Quality Gate passed

Issues
2 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
66.1% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

JKaram requested review from enavitan and mo-radwan1 April 28, 2026 18:44

Hugo-Pare reviewed Apr 29, 2026

View reviewed changes

JKaram added 10 commits May 5, 2026 11:14

Merge branch 'main' into JKaram/DATAGO-130042/synthetic-monitor-auth

3444e25

Merge branch 'main' into JKaram/DATAGO-130042/synthetic-monitor-auth

d3cd004

Merge branch 'main' into JKaram/DATAGO-130042/synthetic-monitor-auth

62b7b15

JKaram requested a review from macstewart May 7, 2026 13:22

JKaram changed the title ~~feat(DATAGO-130042): synthetic-monitor auth path for Datadog smoke~~ feat(DATAGO-130042): synthetic-monitor auth path for Datadog smoke tests May 7, 2026

amir-ghasemi approved these changes May 11, 2026

View reviewed changes

JKaram merged commit 9a76c0f into main May 11, 2026
29 of 31 checks passed

JKaram deleted the JKaram/DATAGO-130042/synthetic-monitor-auth branch May 11, 2026 13:36

solace-re-bot-1 mentioned this pull request May 11, 2026

chore: release 1.25.0 #1510

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(DATAGO-130042): synthetic-monitor auth path for Datadog smoke tests#1461

feat(DATAGO-130042): synthetic-monitor auth path for Datadog smoke tests#1461
JKaram merged 11 commits into
mainfrom
JKaram/DATAGO-130042/synthetic-monitor-auth

JKaram commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

Hugo-Pare Apr 29, 2026

Uh oh!

JKaram Apr 29, 2026 •

edited

Loading

Uh oh!

sonarqube-solacecloud Bot commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		issuer = f"https://login.microsoftonline.com/{tenant_id}/v2.0"
		jwks_uri = f"https://login.microsoftonline.com/{tenant_id}/discovery/v2.0/keys"

Conversation

JKaram commented Apr 27, 2026

What is the purpose of this change?

How was this change implemented?

Key Design Decisions

How was this change tested?

Is there anything the reviewers should focus on/be aware of?

Uh oh!

github-actions Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ FOSSA Guard: Licensing (SolaceLabs_solace-agent-mesh) • PASSED

Uh oh!

github-actions Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ FOSSA Guard: Vulnerability (SolaceLabs_solace-agent-mesh) • PASSED

Uh oh!

Hugo-Pare Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

JKaram Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqube-solacecloud Bot commented May 6, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Apr 27, 2026 •

edited

Loading

✅ FOSSA Guard: Licensing (`SolaceLabs_solace-agent-mesh`) • PASSED

github-actions Bot commented Apr 27, 2026 •

edited

Loading

✅ FOSSA Guard: Vulnerability (`SolaceLabs_solace-agent-mesh`) • PASSED

JKaram Apr 29, 2026 •

edited

Loading