Skip to content

feat(DATAGO-130042): add local Entra ID JWT validation to auth middleware#1440

Closed
JKaram wants to merge 1 commit into
mainfrom
JKaram/DATAGO-130042/aad-jwt-middleware
Closed

feat(DATAGO-130042): add local Entra ID JWT validation to auth middleware#1440
JKaram wants to merge 1 commit into
mainfrom
JKaram/DATAGO-130042/aad-jwt-middleware

Conversation

@JKaram
Copy link
Copy Markdown
Contributor

@JKaram JKaram commented Apr 23, 2026

What is the purpose of this change?

Adds a third authentication branch to shared/auth/middleware.py that validates Microsoft Entra ID JWTs locally via JWKS, positioned between the existing sam_access_token path and the IdP /user_info fallback.

Motivation. The existing IdP fallback path routes bearer tokens through the OAuth proxy's /user_info endpoint, which calls Microsoft Graph's /oidc/userinfo. Graph rejects app-only tokens (no sub claim — only oid and appid) because the OIDC userinfo spec requires sub. This blocks any caller authenticating as a service principal, e.g. CI workflows minting tokens via GitHub OIDC federation.

This PR adds a local-verification branch so app-only Entra ID tokens can reach the backend without touching Graph. Users with real Graph-compatible tokens are unaffected — they still flow through the existing IdP path.

Jira: DATAGO-130042

Companion PR

Consumed by SolaceDev/solace-chat#428, which:

  • Mints an app-only Entra ID token in CI via GitHub OIDC federation (no client secret)
  • Hands it to Playwright as Authorization: Bearer <token> for smoke tests against staging
  • Adds the aad_tenant_id / aad_audience / aad_issuer_override config keys to its webui.yaml that activate the branch introduced here

Without this PR, those config keys are defined but have no consumer — the bearer would still 401 against /user_info.

How is this accomplished?

New module: src/solace_agent_mesh/shared/auth/jwt_validator.py

Generic local JWKS-based JWT verification, with Entra ID as the first supported issuer (module deliberately named jwt_validator rather than aad_jwt_validator so a future Okta/Auth0/etc. local-verify caller can reuse the shape). Public surface:

  • AadValidatorConfig — tenant, audience, optional issuer/JWKS overrides, 60s leeway. accepted_audiences() normalises between api://<guid> and bare <guid> so either CI configuration works.
  • AadTokenValidator.validate(token) -> AadValidationOutcome — three-outcome tagged union (enum, not string): VALID, NOT_AAD, INVALID. No exceptions for control flow.
  • AadClaims — structured view of the validated payload with an is_service_principal flag derived from the idtyp claim (falls back to sub/oid heuristic).
  • build_validator() — factory. PyJWKClient construction is lightweight (no network I/O); the first validate() call fetches JWKS and caches it for 300s.

Three-outcome contract:

Outcome Trigger Middleware reaction
VALID sig + iss/aud/tid/exp/nbf all pass Build request.state.user, 200 path
NOT_AAD PyJWKClientError (kid unknown), non-JWT shape, issuer mismatch Fall through to IdP branch
INVALID kid resolved but bad sig, expired, missing claim, alg=none/HS256 attempt, aud mismatch, tid mismatch Hard 401, no fallthrough

The aud/tid mismatch → INVALID (not NOT_AAD) rule is a confused-deputy defense. Once a kid resolves in Entra ID's JWKS, the token is claiming to be for us — any failure past that point is a hard reject rather than handing the token to another branch that might happen to accept it.

Middleware integration (src/solace_agent_mesh/shared/auth/middleware.py)

  • Ordered branches in _handle_authenticated_request: sam_access_tokenEntra ID JWT → IdP. The AAD branch only runs when both aad_tenant_id and aad_audience are configured.
  • _get_or_build_aad_validator(tenant, audience) helper caches one validator per (tenant, audience) tuple on the middleware instance. Lazy import of jwt_validator keeps PyJWKClient / cryptography off the import path for deployments that don't opt in.
  • Eager mount-time config validation: logs info when enabled, warning + disables the branch if only one of the pair is set (fail noisy, not silent 401s).
  • Soft-auth mode: INVALID outcome sets request.state.auth_probe = True and proceeds, matching the existing pattern for other branches.

Identity shape

  • user_id = first of sub, oid, appid
  • email = first of email, preferred_username, upn, else svc-principal+{appid}@aad-app-only.invalid. The IANA-reserved .invalid TLD (RFC 2606) means the sentinel can never coincidentally match an allowed_domains / shared_user_emails entry.
  • is_service_principal + service_principal_id added to request.state.user so downstream code can branch on identity type without regexing the email.
  • auth_method = "aad_jwt" flag for downstream branching.

Dependencies: no new ones. pyjwt>=2.12.0 and cryptography==46.0.7 are already in pyproject.toml.

Security notes

Verified non-issues (confirmed during security review):

  • alg=none / HS256 key confusionalgorithms=("RS256",) passed explicitly to jwt.decode. PyJWT 2.x removed auto-detect; passing algorithms is mandatory and RS256-only blocks both.
  • jku / x5u header injectionPyJWKClient ignores these; only kid is used against the configured URL.
  • Confused-deputy via mixed branches — aud/tid mismatch post-kid-resolution is hard 401, does not fall through.
  • Email-keyed authorization — audited share.py::can_be_accessed_by_user, shared_user_emails, project_service.py::get_accessible_projects, routers/auth.py, routers/users.py, local_file_identity_service.py. All use exact-match equality or display-only; the .invalid-TLD sentinel is non-matchable across all of them.
  • Dev bypass interaction_handle_authenticated_request is not called when frontend_use_authorization=false (short-circuit earlier in the file). The new branch is unreachable in dev deployments.

Tests

29 new unit tests — all passing, and the full community suite (4959/4959) remains green.

tests/unit/auth/test_jwt_validator.py (20 tests) — validator in isolation. Generates an RSA keypair via cryptography, signs test tokens with PyJWT, stubs PyJWKClient.get_signing_key_from_jwt:

  • Valid app-only / valid user tokens (claim extraction, is_service_principal flag)
  • Expired, not-yet-valid, bad-signature → INVALID
  • alg=none, HS256 key-confusion (hand-forged with hmac.new) → INVALID
  • Wrong audience → INVALID (post-kid-resolution tightening)
  • Wrong tenant tidINVALID
  • Wrong issuer → NOT_AAD
  • Kid not in JWKS → NOT_AAD
  • Non-JWT shape / empty token → NOT_AAD
  • Missing tidINVALID
  • Leeway absorbs small clock skew
  • accepted_audiences() normalisation both directions (api://<guid> ↔ bare GUID)

tests/unit/auth/test_middleware_aad.py (9 tests) — middleware integration. Reuses MockComponent/MockTrustManager patterns, stubs the validator to return canned outcomes:

  • VALID user token / VALID app-only token (asserts .invalid TLD sentinel + is_service_principal)
  • INVALID → 401 JSON response, IdP path not called
  • INVALID under soft-auth → request.state.auth_probe = True, returns False
  • NOT_AAD → IdP fall-through path runs
  • No config → branch skipped, existing behaviour preserved
  • Partial config → branch disabled, warning logged at mount
  • sam_token precedence: AAD branch does not run when sam_token succeeds

Anything reviews should focus on/be aware of?

  • Opt-in only. With aad_tenant_id / aad_audience unset (default), the branch is skipped entirely — zero behaviour change for existing deployments. The lazy import means even the PyJWKClient dependency isn't touched.
  • Confused-deputy rule (aud/tid mismatch = hard 401). Deliberate — a previous draft of this plan fell through on audience mismatch, which relied on the downstream proxy also enforcing audience. Tighter rule here guards against proxy misconfiguration.
  • .invalid TLD sentinel. Intentionally non-matchable. Every email-keyed authorization call site was audited before merge; findings are in the plan doc under "Risks §A".
  • sam_access_token log downgrade (warning → debug). Once this lands, every real Entra ID token passes through the sam-token except clause before succeeding in the new branch. Warning-level volume would explode. Downgrade prevents that — failures are now only loud when they end up as 401s.
  • Out of scope for this PR: multi-audience support, v1-token sts.windows.net issuer, Azure sovereign clouds, conditional access / revocation checks, lifting the branch into a formal provider class. All tracked as follow-ups in the plan.
  • Deployment gating: companion PR (solace-chat#428) must have its staging pod configured with AAD_TENANT_ID, AAD_AUDIENCE, FRONTEND_USE_AUTHORIZATION=true before the cron will succeed end-to-end.

Verification

cd solace-agent-mesh
uv run pytest tests/unit/auth/ -v    # 29/29 green
uv run pytest                        # full community suite green

After merge + staging deploy, trigger playwright-smoke-tests.yaml on solace-chat via workflow_dispatch — expected: end-to-end green (OIDC mint → bearer → local JWKS verify → 200).

…ware

Add a third branch in shared/auth/middleware.py that validates Entra ID
JWTs locally via JWKS, positioned between the existing sam_access_token
path and the IdP /user_info fallback.

Motivation: CI-minted service-principal tokens (via GitHub OIDC
federation) are app-only and lack a `sub` claim, so the existing
fallback path's call to Microsoft Graph /oidc/userinfo rejects them.
Local JWKS verification sidesteps Graph entirely.

Security notes:
- Algorithm whitelist RS256 blocks alg=none and HS256 key confusion
- aud/tid mismatch post-kid-resolution returns 401 (confused-deputy
  defense), not fall-through
- App-only tokens get a `.invalid` TLD sentinel email (RFC 2606) so
  they can never coincidentally match allowed_domains/shared_user_emails
- No new dependencies; PyJWT + cryptography already present

Includes 29 unit tests covering validator outcomes, middleware branch
ordering, and soft-auth probe behaviour.
@github-actions
Copy link
Copy Markdown

✅ FOSSA Guard: Licensing (SolaceLabs_solace-agent-mesh) • PASSED

Compared against main (d7e9b4a723df691477ad87ba4e6374819ba12722) • 0 new, 9 total (9 in base)

Scan Report | View Details in FOSSA

@github-actions
Copy link
Copy Markdown

✅ FOSSA Guard: Vulnerability (SolaceLabs_solace-agent-mesh) • PASSED

Compared against main (d7e9b4a723df691477ad87ba4e6374819ba12722) • 0 new, 6 total (6 in base)

Scan Report | View Details in FOSSA

@sonarqube-solacecloud
Copy link
Copy Markdown

@JKaram JKaram closed this May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant