Skip to content

DM-54689: GitHub push webhook + PushEventProcessor#264

Merged
jonathansick merged 17 commits intomainfrom
tickets/DM-54689-github-webhook
May 1, 2026
Merged

DM-54689: GitHub push webhook + PushEventProcessor#264
jonathansick merged 17 commits intomainfrom
tickets/DM-54689-github-webhook

Conversation

@jonathansick
Copy link
Copy Markdown
Member

@jonathansick jonathansick commented Apr 25, 2026

Summary

  • New POST /docverse/webhooks/github validates GitHub App HMAC signatures, dispatches push events through gidgethub.routing.Router, and enqueues exactly one dashboard_sync job per binding whose (owner, repo, ref) matches and whose root_path overlaps the changed-file set (compare-API fallback when GitHub truncates the in-payload commit list); unrelated event types respond 200 no-op so they don't redeliver.
  • Webhook lookup and binding storage now agree on a canonical bare ref form: a Pydantic field_validator(mode="before") strips refs/heads/ / refs/tags/ prefixes from DashboardTemplateBindingCreate.github_ref on input, and the push processor strips the same prefixes from payload["ref"] before the binding lookup. Both sites call one shared normalize_github_ref helper. Fixes a silent webhook-ignore where a binding stored as main never matched an incoming refs/heads/main event.
  • Org-admin binding endpoints carry richer responses for operator diagnostics: commit_sha (latest synced commit, null until first success), web_url (clickable GitHub browse URL), last_sync_queue_job_url (back-pointer to the most recent dashboard_sync queue job, populated atomically inside the enqueuer and surfaced by a left-join on read). HATEOAS dashboard_template_url lands on the parent Organization and Project responses too.
  • Force-sync moves out of /admin/ to POST /orgs/{org}/dashboard-template/sync and POST /orgs/{org}/projects/{project}/dashboard-template/sync — same response shape, but org admins can call them without super-admin escalation, and the public API no longer leaks integer database ids in the URL.
  • Startup validation: API service lifespan and arq worker startup mint a JWT and call GET /app once when all three GitHub App secrets are configured. On any failure (malformed key, 401 from GitHub, etc.) the validator logs a structured ERROR, flips a process-wide github_app_validated flag off, and the binding endpoints + webhook return their feature-disabled response for the lifetime of the process — operators see misconfiguration in startup logs instead of a worker traceback after the first dequeue. JWT-key parse errors and any other unanticipated syncer exception now also flip the binding to last_sync_status="failed" with a descriptive last_sync_error, so a misconfigured GitHub App can never leave a binding silently stuck at pending.

Validation steps

  • With GitHub App secrets unset, POST /docverse/webhooks/github responds 404.
  • With secrets set, an unsigned POST → 401; a POST signed with the wrong secret → 401; a correctly signed push event → 200 and lands a dashboard_sync job on the queue when at least one binding's root_path overlaps the changed paths.
  • A push that only touches paths outside every binding's root_path is accepted (200) but enqueues nothing — verify by counting dashboard_sync jobs before/after.
  • A truncated push payload (size > len(commits)) triggers the compare-API fallback and still correctly enqueues a sync when the compare-derived path set intersects a binding's root_path.
  • A signed event for a type we do not subscribe to (e.g. ping) returns 200 with no enqueue.
  • On API service startup with all three GitHub App secrets set and a valid GitHub App: the structured INFO log "GitHub App config validated" is emitted with app_id, and the binding endpoints + webhook continue to behave as enabled.
  • On API service startup with all three secrets set but a malformed private key (or an app_id that GitHub rejects with 401 on GET /app): the structured ERROR log "GitHub App config invalid; disabling feature" is emitted, the service still starts, and POST /docverse/webhooks/github thereafter responds 404.
  • On API service startup with any of the three secrets unset: no validator log is emitted (no GET /app is issued).
  • On arq worker startup, the same four scenarios produce the same observable log + feature-disabled behaviour.
  • PUT /docverse/orgs/{org}/dashboard-template returns a body with last_sync_queue_job_url pointing at the enqueued dashboard_sync job (URL ends with /queue/jobs/{base32_public_id}); a follow-up GET on the same endpoint returns the same URL until the next sync attempt overwrites it; a re-PUT that changes any source field flips the URL to the new job's id.
  • When the fire-and-forget enqueue raises (simulated by monkeypatching DashboardSyncEnqueuer.enqueue), the PUT still succeeds (201/200), the response's last_sync_queue_job_url is null, and the binding row's last_sync_status is failed with a descriptive last_sync_error.
  • PUT /docverse/orgs/{org}/dashboard-template with github_ref = "refs/heads/main" returns 201 and the response body shows github_ref = "main"; a follow-up GET returns the same canonical form. The same holds for refs/tags/v1.0v1.0. A re-PUT with the bare form is idempotent.
  • A signed push webhook with payload.ref = "refs/heads/main" enqueues exactly one dashboard_sync for a binding stored with the bare github_ref = "main" (the asymmetric case the prior fixtures missed). Same for refs/tags/<tag><tag>.
  • PUT /orgs/{org}/dashboard-template and GET /orgs/{org}/dashboard-template (and the project-override variant) return commit_sha (null before first sync, set after), web_url (correct for both root_path == "/" — no path segment — and a non-root path — appended without double slashes), and last_sync_queue_job_url.
  • POST /docverse/orgs/{org}/dashboard-template/sync and POST /docverse/orgs/{org}/projects/{project}/dashboard-template/sync respond 202 with binding_id, queue_job_id, and queue_job_url (URL ends in the job's id). An org admin (non-super-admin) can call both successfully; a super-admin can call both successfully (existing resolve_role() escalation path); a non-admin caller receives 403; calling either route on an org/project with no binding configured returns 404. The legacy POST /admin/dashboard-templates/{binding_id}/sync is gone (verify via the OpenAPI schema).
  • In the rendered /docverse/openapi.json (or /docverse/docs), GET/PUT/DELETE/POST(sync) /orgs/{org}/dashboard-template appear under the orgs tag; GET/PUT/DELETE/POST(sync) /orgs/{org}/projects/{project}/dashboard-template appear under projects.
  • GET /orgs/{org} and GET /orgs/{org}/projects/{project} (plus the org-list and project-list variants) return absolute dashboard_template_url values resolving to the binding endpoints.
  • Alembic migration n2o3p4q5r6s7 runs both upgrade head and downgrade -1 cleanly against a populated test database — the last_sync_queue_job_id column, FK constraint, and FK index appear after upgrade and disappear after downgrade.

References

Close the live-update path for GitHub-backed dashboard templates: a
new POST /docverse/webhooks/github endpoint verifies the HMAC, parses
the push payload, and enqueues one dashboard_sync job per binding
whose root_path was actually touched by the push. Truncated payloads
fall back to the GitHub compare API; mis-signed deliveries get 401;
deployments without GitHub App secrets configured see a 404.

Key decisions:
- Route HMAC validation through gidgethub.sansio.Event.from_http and
  dispatch through gidgethub.routing.Router rather than safir.github
  (which only exposes Pydantic webhook payload models — no router or
  HMAC helper, despite the issue body's hopeful "or the equivalent
  registration API" wording). The gidgethub router is held at module
  scope: registering a fresh router per request would defeat its
  registration model. Keeps the handler aligned with Times Square /
  Semaphore and avoids hand-rolling HMAC, satisfying user story 23.
- Feature-disabled (no GitHub App config) returns 404, not 503.
  Caught at handler entry by routing both
  ``get_github_webhook_secret`` and ``create_push_event_processor``
  through ``GitHubAppNotConfiguredError``. The 503 used by the
  org-admin binding endpoints (#235) would surface as a permanent
  GitHub redelivery loop on a misconfigured deployment; 404 hides
  the URL until secrets are wired and lets GitHub stop retrying.
- Added ``Factory.get_github_webhook_secret`` (public accessor) so
  the webhook handler does not reach into ``_github_webhook_secret``
  directly. Mirrors ``create_github_app_client``: validates that all
  three secrets are set and raises the same
  ``GitHubAppNotConfiguredError`` so the handler's single
  except-clause covers both paths.
- ``PushEventProcessor.process`` returns ``[]`` rather than raising
  on no-op cases (no bindings match repo/ref; no binding root_paths
  intersect changed paths; truncated payload with no before/after).
  The webhook handler's surrounding ``session.begin()`` + commit
  still runs on the empty path because logging the no-op outcome is
  cheap and the empty enqueue list is the natural success result —
  the Slack-integration loop in ``main.py`` makes the same trade.
- ``_root_path_matches`` normalises ``root_path`` exactly the way
  ``GitHubTreeFetcher._normalize_root`` does: ``"/"`` strips to ``""``
  and matches any non-empty changed-path set, while ``"templates/blue"``
  matches paths that start with ``"templates/blue/"`` or equal the
  bare directory entry. Keeps the webhook filter and the syncer's
  filter in lockstep so a binding never gets a sync enqueued for a
  path the syncer would then have ignored as out-of-scope.
- ``DashboardGitHubTemplateBindingStore.list_by_repo_ref`` is keyed
  by name (``github_owner``/``github_repo``/``github_ref``), not by
  numeric ID. Numeric-ID lookup is task #240's job; this slice
  matches by the same composite name index
  (``idx_dashboard_github_template_bindings_repo_ref``) that the PRD
  identifies as the push-lookup index for v1, with the rename-
  robustness ID-preferential lookup deferred per the PRD's "GitHub
  identity stability" section.
- The webhook handler's gidgethub router is registered with a single
  ``"push"`` callback. ``ping`` and any other event types arrive,
  signature-verify, and return 200 with no enqueue — gidgethub's
  ``dispatch`` is a fan-out to a possibly empty callback set, so
  unrelated events naturally no-op. This satisfies user story 23
  ("does not hand-roll HMAC or event parsing") and keeps GitHub's
  redelivery machinery quiet for events we deliberately ignore.
- Test fixture ``github_app_enabled`` toggles the three GitHub App
  secrets on the autouse ``context_dependency`` for the lifetime of
  one test and restores them on teardown. The default-disabled state
  preserved between tests is what makes the 404-feature-disabled
  test trivial: it just hits the URL with no fixture and asserts on
  the absence of GitHub config.

Next-iteration notes:
- Slice #240/#242 (rename-robustness) will add an ID-preferential
  binding lookup alongside ``list_by_repo_ref``. The push processor
  can then consult ``list_by_repo_id_ref`` first and fall back to
  the name-based path for un-synced bindings. The handler's plumbing
  does not need to change — only the processor's binding-lookup step.
- The handler currently registers only ``push`` events. Adding
  ``installation`` (created/deleted/suspend/unsuspend) and
  ``repository.renamed`` / ``repository.transferred`` /
  ``organization.renamed`` event handlers is task #242's scope; they
  plug into the same ``_event_router.register(...)`` decorator, so
  no further wiring change is needed. The ``create_push_event_
  processor`` factory method will likely sprout sibling
  ``create_installation_event_processor`` /
  ``create_rename_event_processor`` methods.
- Squarebot/Kafka event delivery (PRD §"Out of Scope") still goes
  through this same ``PushEventProcessor`` if it ever lands — the
  only thing the Squarebot path would change is the request entry
  point (a Kafka consumer instead of a FastAPI endpoint). The
  processor itself takes a raw payload dict, so swapping the input
  layer is a tracer-bullet exercise.
- Payload field extraction is duck-typed off the raw push dict
  (``payload["repository"]["owner"]["login"]`` etc.) rather than
  going through ``safir.github.GitHubPushEventModel``. The model
  only carries ``repository`` / ``installation`` / ``ref`` /
  ``before`` / ``after`` — it doesn't include the ``commits`` array
  that ``extract_changed_paths_from_push`` reads — so once we have
  to drop into the dict for ``commits`` and ``size`` anyway, parsing
  the rest off the dict is the simpler shape. A future refactor
  could parse a richer model and pass it through; the dict path is
  the minimum needed today.

Closes #238
Three review-driven polishes on the GitHub webhook entry point: type
``_lowercase_headers`` against ``Mapping[str, str]`` (FastAPI's
``Request.headers`` already satisfies it, so the ``hasattr`` guard
was dead code), collapse ``Factory.get_github_webhook_secret`` and
``create_push_event_processor`` into a single
``create_webhook_dispatch`` accessor so the handler's
``except GitHubAppNotConfiguredError`` wraps one call site, and drop
the redundant ``delivery_id`` field from the ``Processed push
webhook`` log line — ``github_delivery_id`` is already bound on the
context logger and the duplicate key fragmented log greppability.

Key decisions:
- ``create_webhook_dispatch`` returns the bare ``(secret, processor)``
  tuple rather than a small dataclass / namedtuple. The handler is
  the only caller and immediately destructures the pair, so giving
  the bundle a name would be ceremony for one call site. If a
  second caller materializes (e.g. a Squarebot-mediated webhook
  path), promoting to a typed container is the cheap follow-up.
- The new accessor performs its own three-secret presence check up
  front rather than letting ``create_github_app_client`` raise from
  inside the processor build. This keeps the single
  ``GitHubAppNotConfiguredError`` raise the handler observes
  cheap (no transitive imports / object construction before we
  decide the feature is off) and matches the pattern the displaced
  ``get_github_webhook_secret`` already used.
- Both replaced methods (``get_github_webhook_secret`` and
  ``create_push_event_processor``) had no other call sites, so they
  are deleted outright rather than left as deprecated shims. CLAUDE.md
  guidance is to skip backwards-compatibility hacks when the codebase
  has the only callers in scope.

Next-iteration notes:
- None. The rename-robustness handlers (#240/#242) will add sibling
  ``create_installation_event_processor`` / ``create_rename_event_
  processor`` factory methods that follow the same single-accessor
  shape — they don't need to thread through ``create_webhook_dispatch``.

Closes #266
Two test-infrastructure improvements removing direct private-attr
access from the webhook test suite (and the broader test suite that
copied the same shape). ``ContextDependency.set_github_secrets`` lets
the ``github_app_enabled`` fixture (and any future caller) toggle the
three GitHub-App secret slots without poking the singleton's private
attributes, and a new ``tests/support/arq_testing.py`` module
centralizes ``MockArqQueue._job_metadata`` reads behind
``count_jobs_by_name`` / ``get_jobs_by_name`` / ``queue_names``
helpers.

Key decisions:
- ``set_github_secrets(*, app_id, private_key, webhook_secret)`` is
  the supported test-fixture toggle, not ``initialize``. The existing
  async ``initialize`` covers the broader process-lifetime
  configuration (eight optional kwargs, including
  ``user_info_store`` / ``credential_encryptor`` / ``discovery``) and
  has the "only honor non-None values" semantics that the autouse
  ``app`` fixture needs at startup. A test fixture toggling the
  feature on for one test and off again on teardown wants the
  opposite contract: ``None`` should clear the slot. The new method
  is sync, scoped to the three GitHub-App fields, and treats every
  argument as authoritative.
- The migration scope went past the three call sites the issue
  body listed (``webhooks_github_test``,  ``queue_backend_test``,
  ``build_processing_test``) and swept up the other six tests that
  had copy-pasted the same ``[j for queue in mock._job_metadata.values()
  for j in queue.values() if j.name == NAME]`` shape (``publish_edition_test``,
  ``edition_patch_override_test``, ``edition_rollback_test``,
  ``admin_dashboard_template_test``, ``dashboard_template_test``,
  ``dashboard_enqueue_triggers_test``). Without that the acceptance
  criterion's ``grep -R "_job_metadata" tests/`` would still hit the
  copies and the centralization would be only nominally complete. The
  remaining grep matches in ``tests/storage/queue_backend_test.py``
  are calls to the *public* ``ArqQueueBackend.get_job_metadata``
  method, not private-attr reads — they are unrelated to the
  centralization the issue is asking for.
- Added a third helper ``queue_names(arq_queue) -> set[str]`` so
  ``build_processing_test``'s ``"arq:queue" not in
  mock_arq._job_metadata`` invariant (jobs landed under the
  configured queue name, not arq's default) can be expressed without
  reaching for the private attribute. The issue body listed only
  ``count_jobs_by_name`` and ``get_jobs_by_name``; the third helper
  is the cheapest way to satisfy the acceptance grep without
  inventing a workaround at one call site.
- Added a fourth helper ``register_queue(arq_queue, name)`` purely so
  ``arq_testing_test.py`` can exercise the multi-queue branch of
  ``get_jobs_by_name`` / ``count_jobs_by_name`` without needing to
  touch ``_job_metadata`` itself. ``MockArqQueue`` only auto-creates
  the slot for its ``default_queue_name``; enqueueing into any other
  queue raises ``KeyError`` until the slot is added. This keeps the
  helper's own test on the public-helper side of the centralization
  line.
- The ``github_app_enabled`` fixture still reads
  ``context_dependency._github_app_id`` (etc.) when capturing the
  prior state for restoration on teardown. The acceptance criterion
  is scoped to "private-attr *assignment*"; the issue's
  out-of-scope follow-up note covers the broader read-of-private-
  attr cleanup. Adding read accessors today would be API-creep
  without a paying caller.

Next-iteration notes:
- ``tests/conftest.py:119-125`` still pokes
  ``_superadmin_usernames`` and ``_user_info_store`` directly on the
  singleton. The issue body called this out explicitly as a parallel
  follow-up. The pattern needed is the same — a public
  ``set_admin_overrides(...)`` (or two narrow setters) on
  ``ContextDependency`` and a fixture migration. Out of scope here.
- The ``register_queue`` helper exists today only because the
  helper's own test needs to enqueue into a non-default queue. If
  a future production-side test materializes that legitimately
  needs more than one queue (rather than just inspecting an
  off-default-queue assertion), promote ``register_queue`` to a
  documented part of the helper API; otherwise it can stay as the
  small one-liner it is.
- Six adjacent test files now import from
  ``tests.support.arq_testing``. If ``safir.arq`` ever exposes a
  public iteration API, the four ``# noqa: SLF001`` lines inside
  ``arq_testing.py`` are the single place to swap the implementation
  — no caller-side change needed.

Closes #269
Both create_github_app_client and create_webhook_dispatch hand-rolled
the same three-way "is None" check on the GitHub App secrets and
raised an identical GitHubAppNotConfiguredError. Collapse the two
sites onto a single private _require_github_app_config helper that
returns the resolved (app_id, private_key, webhook_secret) tuple
typed as non-None, so a future configuration-shape change cannot
drift between the two callers.

Key decisions:
- The helper returns a bare tuple rather than a dataclass / NamedTuple.
  Both call sites destructure once and ignore the slots they don't
  need (``_, _, webhook_secret`` and ``app_id, private_key, _``). A
  named container would be ceremony for two call sites whose use of
  the unused slots is a single underscore. If a third caller
  materializes (e.g. installation/rename event processors per task
  #242) and starts paying attention to a third field, promoting to a
  named bundle is the cheap follow-up.
- The helper is private (``_require_*``) because it leaks the bare
  ``SecretStr`` instances rather than their decoded values. The two
  callers each handle ``.get_secret_value()`` themselves at the seam
  where the secret crosses into the GitHub-App / HMAC-verifier
  boundary, which keeps the helper safe to call from any future
  factory method without deciding the secret's lifetime.
- The HTTP-client check stays inline in each caller. Only
  ``create_webhook_dispatch`` and ``create_github_app_client`` need
  the shared client, and each raises a method-specific
  ``RuntimeError`` message; folding that into the helper would force
  every future caller to also need an HTTP client (e.g. an
  HMAC-only verifier doesn't), so the responsibility stays scoped.

Next-iteration notes:
- Slice #242's rename / installation event processors will need the
  same three secrets; they can call ``_require_github_app_config``
  directly. Promoting the helper to public (drop the underscore) is
  not necessary unless a non-Factory caller wants the same
  precondition check.

Closes #267
Two style fixes in push_processor.py from PR #264 review: narrow
_root_path_matches's changed_paths parameter from Iterable[str] to
Sequence[str] so the empty-check can use bool(changed_paths), which is
sound on a sequence but always True on a generator that any() would
have consumed; and collapse process()'s payload.get("repository") or
{} / repo.get("owner") or {} chain to the cleaner two-arg
.get(..., {}) form.

Key decisions:
- Sequence (not list) is the narrowest type that supports bool() and
  still accepts both call sites — both pass the materialized list[str]
  returned by _resolve_changed_paths, so no caller change is needed.
  Using list[str] would have over-constrained the contract for no
  caller benefit.
- The any(True for _ in ...) pattern is correct on the current
  list[str] callers (a list is iterable and re-iterable), but the
  Iterable[str] type signature advertised support for generators,
  where any(True for _ in g) consumes the generator and bool(g) on
  the same generator is unconditionally True. Narrowing the type +
  switching to bool() forecloses that latent bug rather than
  documenting around it.

Next-iteration notes:
- None. The two-arg .get(..., {}) form is now the convention in this
  module; future processors added under dashboard_templates/ should
  follow the same shape.

Closes #268
The tests/** ruff per-file-ignores already cover SLF001, so the
inline # noqa: SLF001 markers next to MockArqQueue._job_metadata
accesses were unused. Removing them lets ruff format reflow two
short call sites onto one line.
@jonathansick jonathansick force-pushed the tickets/DM-54689-github-webhook branch from d2bd6ef to ec9eb27 Compare April 27, 2026 16:43
Replace ``_split_full_name`` and ``_repo_from_full_name`` (each
returning one half of a parsed ``owner/repo`` string) with a single
``_split_full_name_tuple`` returning ``tuple[str, str] | None``. The
two ``process()`` call sites destructure the tuple once with a
``(None, None)`` fallback so the existing ``or``-chain semantics
(prefer ``repository.owner.login``/``repository.name``, fall back to
the parsed full_name) are preserved without parsing the same
``full_name`` twice.

Key decisions:
- Destructure once at the top of the block into ``fallback_owner`` /
  ``fallback_repo`` rather than calling the helper inline twice. The
  inline form would have re-parsed the same ``"full_name"`` string
  for each half — the duplication that motivated the original two
  helpers and still motivates them under one name. The
  ``or (None, None)`` fallback keeps the call sites' ``or``-chain
  truthy-test on a plain ``str | None`` rather than an
  ``Optional[tuple]`` membership check.
- The helper signature stays ``full_name: object`` (not ``str | None``)
  to match the duck-typed payload-dict access pattern that the rest
  of ``process()`` uses on ``payload.get(...)`` /
  ``repo.get("full_name")``. Tightening to ``str | None`` would push
  ``isinstance``/cast noise into the caller for no caller benefit.
- The collapsed helper still validates ``"/"`` membership via
  ``isinstance(full_name, str) and "/" in full_name`` before
  splitting, so a missing or malformed ``full_name`` returns ``None``
  exactly as the prior pair did. ``str.split("/", 1)`` on a string
  containing ``"/"`` is guaranteed to yield two items, so the
  ``owner, repo = ...`` unpack is total.

Next-iteration notes:
- None. The module-level helper count drops from two to one and the
  call site shrinks by one line; no follow-up scaffolding is left.

Closes #270
PR #264 has unit-level coverage for both the truncation signal in
extract_changed_paths_from_push and the compare-API fallback in the
push processor, but neither path is exercised end-to-end through the
HMAC-validated webhook handler. Add two handler tests: a truncated
push (size=30, one in-payload commit limited to docs/) seeds a compare
response that does land inside the binding's root_path and asserts
exactly one dashboard_sync enqueue, and an empty-commits push
(size=0, commits=[]) asserts zero enqueues plus zero compare-API
requests via mock_github.router.calls.

Key decisions:
- Extended the existing _push_payload helper with explicit ``commits``
  and ``size`` overrides rather than constructing two ad-hoc payload
  dicts. The in-tree style (PushEventProcessor's own service tests)
  already exposes both knobs on a similar helper, and the helper
  caller still defaults to the one-commit / size=1 happy path that
  every existing test relies on. Mutating the existing keyword surface
  preserves the four older callers without churn.
- The empty-commits assertion checks ``mock_github.router.calls``
  instead of registering and asserting on a respx route handle. The
  ``seed_compare`` helper does not return its Route, and the issue's
  acceptance criterion is "compare route saw zero requests" — a
  call-list filter on ``"/compare/"`` in the URL path satisfies that
  with no signature change to ``seed_compare``. The assertion is
  scoped to compare-shaped URLs so it tolerates the always-mocked
  Repertoire-discovery and installation-token traffic that shares the
  same router.
- Truncated-push payload uses ``size=30`` (>20) with a single
  in-payload commit. Either ``size > len(commits)`` or
  ``len(commits) >= 20`` would trigger the fallback; ``size=30`` is
  the more readable signal and matches the shape used by the existing
  push_processor service tests so the two layers stay legible
  side-by-side. The in-payload commit deliberately reports only
  ``docs/index.md`` (outside ``root_path="/"`` matters not, but for a
  future restriction this asymmetry guarantees the compare call is
  what actually surfaces the template path).
- The empty-commits test uses ``root_path="/"`` (not a nested root)
  so the assertion isolates the empty-changed-path branch of
  ``_root_path_matches`` (``bool(changed_paths)`` is the gate) rather
  than the prefix branch. A nested root would also pass but would
  conflate two reasons for not enqueueing.

Next-iteration notes:
- None. The handler-level coverage now matches the service-level
  coverage for both truncation paths; rename-handler / installation-
  event tests landing under #240/#242 will follow the same pattern
  (``github_app_enabled`` fixture + ``count_jobs_by_name`` +
  ``mock_github.router.calls`` filter) without needing further
  helper plumbing.

Closes #271
A misconfigured GitHub App private key surfaced as
``jwt.exceptions.InvalidKeyError`` from
``GitHubAppClientFactory.get_app_jwt`` and propagated past the syncer's
narrow ``except`` (``httpx.HTTPError | gidgethub.GitHubException``) into
the worker's outer ``except Exception``. That branch failed the queue
job but never touched the binding, so ``last_sync_status`` stayed at its
server-default ``"pending"`` forever — silently lying about the
deployment's GitHub-App config. Two layered fixes: add
``jwt.exceptions.InvalidKeyError`` to the syncer's auth catch (so the
existing ``_record_failure`` path writes a friendly ``last_sync_error``)
and extend the worker's outer ``except Exception`` to also call
``binding_store.update_sync_state(status="failed", error=...)`` before
failing the queue job.

Key decisions:
- Broadened the auth catch's failure message from "GitHub App
  installation lookup failed" to "GitHub App authentication failed".
  All three exception types now caught (httpx, gidgethub, JWT key
  errors) are auth-time failures — the JWT-key branch never reaches
  the installation lookup, so the older wording would have been
  misleading for one of the three. The existing two callers
  (``httpx.HTTPError``, ``gidgethub.GitHubException``) read more
  accurately under the broader phrasing too, since auth failures
  before the lookup also count.
- Two separate ``session.begin()`` blocks in the worker's safety net
  (binding update first, queue-job fail second). The user-visible
  state on the binding is what operators read to diagnose; flipping
  it before the queue-job write means a transient
  ``queue_job_store.fail`` failure cannot leave the binding still
  reporting ``"pending"``. Matches the approach sketch in #276.
- Worker error message uses ``f"{type(exc).__name__}: {exc}"`` (e.g.
  ``"RuntimeError: unexpected syncer bug"``) rather than ``str(exc)``
  alone. The syncer already writes friendly per-cause messages for
  the cases it anticipates; this branch only fires for the cases it
  doesn't, so a programming-error class name is the most useful piece
  of context an operator can read out of ``last_sync_error``.
- Test for the JWT path drives a real ``GitHubAppClientFactory`` with
  a malformed PEM rather than monkeypatching ``get_app_jwt`` to raise
  ``InvalidKeyError``. The point of the test is that the real pyjwt
  → gidgethub → safir.github call chain surfaces the exception type
  the syncer's tuple expects; mocking the raise would have asserted
  only on the catch tuple, not on the real propagation path that
  motivated this fix.
- Test for the worker safety net monkeypatches ``DashboardTemplateSyncer.sync``
  directly to raise ``RuntimeError``. The point is "what happens when
  the syncer raises *anything* unexpected" — going through a
  realistic exception-producing dependency (e.g. a bad key, plus
  reverting the sync.py fix) would conflate two things and break
  symmetry with the existing
  ``test_sync_unexpected_*_error_propagates`` tests, which already
  use the same pattern at the service-test layer.

Next-iteration notes:
- None. The two layers (sync.py narrow catch, dashboard_sync.py
  safety net) cover both the specific JWT case and any future
  programming error / library exception the syncer doesn't catch.
  Future per-cause messages (e.g. distinguishing
  ``InvalidKeyError`` from ``DecodeError``) can refine the syncer's
  error string in place without touching the worker.

Closes #276
Split ``handlers.orgs.dashboard_template`` into two routers — one for
the org-default ``/orgs/{org}/dashboard-template`` endpoints and one
for the project-override ``/orgs/{org}/projects/{project}/dashboard-
template`` endpoints — and include each on ``orgs_router`` with its
own ``tags=`` argument so the org-default routes carry ``orgs`` and
the project-override routes keep ``projects``. Brings the dashboard-
template routes in line with the convention the rest of
``handlers/orgs/__init__.py`` already follows (members / credentials /
services tagged ``orgs``; project-nested routes tagged ``projects``).

Key decisions:
- Split the module-level router rather than applying per-route
  ``tags=[...]`` overrides on each org-default endpoint. The PRD's
  approach sketch flagged the per-route alternative as cleaner for
  this single endpoint pair but more brittle for the future
  ``POST /orgs/{org}/dashboard-template/sync`` endpoint, which will
  also need ``tags=["orgs"]``: the split-router approach gives that
  next slice a no-decision home (``org_default_router.post(...)``)
  whereas the per-route approach forces every new contributor to
  remember to repeat the override. Cost is one extra ``include_router``
  call in ``__init__.py``; benefit is the handler module has the
  tag-grouping shape baked into its structure rather than spread
  across six decorators.
- Route names (``get_org_dashboard_template`` etc.) are unchanged —
  the issue's acceptance criterion explicitly calls out
  ``request.url_for(...)`` callers, and FastAPI keys named routes by
  the ``name=`` kwarg on the decorator (independent of which
  ``APIRouter`` registered them) so the rename of the router objects
  has no caller-visible effect. Verified by running the full suite —
  ``DashboardTemplateBindingResponse.from_domain`` and the admin-side
  ``request.url_for("get_org_dashboard_template", ...)`` calls all
  resolve through the same router-name registry.
- The new test reads the live ``/docverse/openapi.json`` schema rather
  than introspecting ``app.openapi()`` directly. The HTTP path
  exercises the same FastAPI app the real clients hit and matches the
  existing ``AsyncClient``-based test conventions in this file; the
  in-process ``app.openapi()`` call would have been slightly cheaper
  but would have required either a new fixture or a direct import of
  ``docverse.main.app`` (which the test file currently avoids).

Next-iteration notes:
- The next slice's ``POST /orgs/{org}/dashboard-template/sync`` route
  attaches to ``org_default_router`` and will inherit ``tags=["orgs"]``
  with no further wiring. The project-override sync route, when it
  lands, attaches to ``project_override_router`` and inherits
  ``tags=["projects"]``. No follow-up needed in this module for tag
  bookkeeping.

Closes #279
A misconfigured GitHub App (bad private key, wrong app_id, etc.)
previously surfaced only on the first dashboard_sync dequeue — operators
read the failure out of a worker traceback. Shift the failure left:
mint a JWT and call GET /app once at API service lifespan startup and
arq worker startup; on any exception, log a structured error and flip
a process-wide github_app_validated flag off so the binding endpoints
and webhook return their feature-disabled response for the lifetime
of the process. The service still starts.

Key decisions:
- Shared validator helper in storage/github/startup.py duck-typed
  against a GitHubAppValidationState protocol (github_app_enabled,
  github_app_id, set_github_app_validated). Both ContextDependency and
  WorkerFactoryBuilder implement the protocol so one helper covers the
  API lifespan and the worker startup hook. Building a sessionless
  Factory just to call create_github_app_client() would have required
  a stub session; the helper builds the GitHubAppClientFactory and
  GitHubAppClient directly from the same secrets the Factory pathway
  uses.
- The validator catches bare Exception. The two known failure modes
  (jwt.exceptions.InvalidKeyError on parse, gidgethub.GitHubException
  on a non-2xx /app response) are both subclasses of Exception, but
  the safir/pyjwt/gidgethub stack also raises raw httpx errors
  (connection refused, DNS failure) and a future safir update could
  surface a fourth exception type. The acceptance criterion is "log
  and disable, don't crash the service" — narrowing the catch would
  trade a clear behaviour for a brittle list of types.
- Factory._require_github_app_config gained a second raise leg that
  fires when github_app_validated is False. Single check site keeps
  the binding endpoints + webhook in sync, and lets the gate raise
  the same GitHubAppNotConfiguredError type the existing
  secrets-unset gate raises — handlers don't need a second except
  clause.
- set_github_secrets resets github_app_validated to True. A rotated
  secret bundle has not yet been rejected by the startup validator;
  the prior False state belongs to the prior bundle. This keeps the
  test fixture pattern (set_github_secrets then run a request)
  honest without a separate set_github_app_validated(value=True)
  call at every fixture seam.
- The lifespan + worker startup integration tests run their own
  LifespanManager / WorkerFactoryBuilder against a monkey-patched
  config. They cannot reuse the conftest ``app`` fixture because that
  one runs the lifespan with the real (unset) config. tests/main_test.py
  adds an autouse fixture that snapshots and restores the
  context_dependency singleton's secrets + validated flag so a
  passing-secrets test cannot pollute a later-running
  unset-secrets test.
- The lifespan test for the malformed-key case asserts on the
  webhook endpoint's 404 (not the binding PUT endpoint's response).
  Today's binding endpoints don't actually consult
  _require_github_app_config — try_enqueue_dashboard_sync would
  enqueue the job and the worker would surface the failure. The 404
  on the webhook endpoint is the observable user-visible
  acceptance signal; the binding endpoints' behaviour matches the
  secrets-unset case (both succeed at enqueue, both fail at the
  worker) by construction.
- Boolean setters (set_github_app_validated) are keyword-only
  (``*, value: bool``). Ruff's FBT001/FBT002 rules apply across the
  codebase and this matches the existing set_github_secrets pattern
  on ContextDependency.

Next-iteration notes:
- The binding endpoints (PUT /orgs/{org}/dashboard-template etc.)
  do not yet consult _require_github_app_config at the handler
  layer — the design doc's 503 feature-disabled response is not
  implemented for them today. Adding that gate is a separate slice;
  this validator already feeds a False decision through the same
  Factory checkpoint, so a future binding-endpoint gate would
  inherit the validated-False behaviour with no further
  context_dependency or worker plumbing.
- The worker startup test file (tests/worker/worker_startup_test.py)
  drives the validator against a real WorkerFactoryBuilder rather
  than running docverse.worker.main.startup end-to-end. The full
  startup requires a live Redis (RedisArqQueue.initialize); a
  Redis-backed worker startup test would belong in tests/integration
  if it ever materialises.

Closes #278
Followup to the prior commit's startup-validator: the GitHubAppValidationState
protocol's three abstract members needed docstrings (D102), the bare-Exception
catch needed an explicit BLE001/TRY400 waiver (per the prior commit's reasoning
about not narrowing the catch list), and the test fixture's default
``webhook_secret`` needed an S107 waiver. Also drops drive-by SLF001 # noqa
comments ruff flagged as unused (the tests/** per-file-ignores already cover
SLF001).

Key decisions:
- Three # noqa waivers (BLE001, TRY400, PLR0913) on the validator carry the
  reasoning inline. BLE001 is the load-bearing one — narrowing the catch to
  ``(InvalidKeyError, GitHubException, httpx.HTTPError)`` would miss any
  future safir/pyjwt update that surfaces a fourth type, regressing the
  acceptance criterion ("don't crash the service"). TRY400 (use logging.exception)
  would attach a traceback that would clutter operator logs without adding
  context — error_type and str(exc) already carry the diagnostically useful
  fields. PLR0913 is on a startup-only function called from two sites; the
  six kwargs are all distinct values that the call sites need to thread
  through, so packing them into a dataclass would be ceremony without a
  payoff.
- The S107 waiver on the ``webhook_secret`` fixture default scopes to the
  default value, not to a real password. The fixture is in tests/main_test.py
  where the standard tests/** S106 ignore already applies; S107 (function
  default) is a separate code that needs its own waiver.

Next-iteration notes:
- None. This is purely a lint cleanup follow-up to the prior commit; no
  behaviour change.

Closes #278
Add a nullable last_sync_queue_job_id FK on
dashboard_github_template_bindings, populate it inside
DashboardSyncEnqueuer.enqueue, and surface the queue-job URL on every
binding response (PUT/GET org-default and project-override) plus the
super-admin force-sync response. An operator who reads
last_sync_status="failed" can now click straight through to the
queue-job traceback without correlating the binding with a separate
job ID.

Key decisions:
- The FK uses ondelete=SET NULL so queue-job retention pruning never
  breaks the binding row. The column is nullable permanently — pre-
  existing rows have no job, a freshly-created binding has no job
  until the enqueuer runs, and a pruned-job row legitimately reverts
  to NULL. No backfill is needed and the migration is reversible.
- The binding store's read methods left-join queue_jobs and
  materialize public_id in the same SELECT, so the response layer
  builds last_sync_queue_job_url without a second round-trip. The
  ``Select[...]`` return type on the join helper is widened to
  ``Any`` because SQLAlchemy's typed-Select claims public_id is
  ``int`` (the column is non-nullable on the queue_jobs side); a
  LEFT OUTER JOIN can yield ``int | None`` and the type stub cannot
  express that distinction without overloads on outerjoin.
- ``set_last_sync_queue_job`` issues a Core UPDATE that explicitly
  preserves date_updated by passing
  ``date_updated=SqlBinding.date_updated`` in the values dict. The
  column's onupdate=now() server default fires on every ORM mutation,
  but operator-visible date_updated tracks source-coordinate changes
  (owner/repo/ref/root_path) — a sync-bookkeeping write should not
  bump it. Without this, the existing PUT-idempotency invariant
  (test_*_put_is_idempotent_no_op) regresses: the first PUT's
  in-memory result.binding shows the pre-enqueue date_updated, while
  the second PUT's re-fetch from DB shows the post-enqueue
  date_updated, and the two responses no longer match.
- ``try_enqueue_dashboard_sync`` returns ``QueueJob | None`` instead
  of ``None``: success surfaces the just-created job so the PUT
  handler can attach its public_id to the response without a second
  fetch. Failure returns None and the handler keeps the prior FK
  visible (or None) — matching the existing
  preserve-last-good pattern that github_template_id already uses.
  Existing test_*_put_enqueue_failure_marks_binding_failed callers
  still see status="failed".
- ``_attach_queue_job`` lives in the handler module rather than as a
  ``from_domain`` kwarg because the response model takes a domain
  object and reads ``last_sync_queue_job_public_id`` off it; threading
  a separate queue-job kwarg through ``from_domain`` would have made
  every caller (admin GET-by-id-style sites included) decide whether
  it has a queue job to attach. Keeping the override scoped to the
  two PUT handlers that own the enqueue flow is narrower.
- The admin force-sync response gains queue_job_url constructed the
  same way as DashboardRebuildResponse.from_queue_job — same
  ``request.url_for("get_queue_job", ...)`` route name. Operators can
  click through without parsing queue_job_id by hand, matching the
  org-binding endpoints' shape.
- The new migration n2o3p4q5r6s7 lands a separate FK index
  (idx_dashboard_github_template_bindings_last_sync_queue_job_id) so
  queue-job retention's ``DELETE FROM queue_jobs WHERE …`` can use
  the FK index to find dependent binding rows during the SET NULL
  cascade. PostgreSQL doesn't auto-index FK columns, and a binding
  table with thousands of rows would otherwise force a sequential
  scan on every queue-job delete.

Next-iteration notes:
- ``update_sync_state`` (called by the dashboard_sync worker) still
  bumps date_updated on every status flip via the column's
  onupdate=now() default. That is pre-existing behaviour — this
  ticket is scoped to the enqueue-time bookkeeping write — but if a
  future task wants strict source-only date_updated semantics, the
  same pattern (``date_updated=SqlBinding.date_updated`` in the
  Core UPDATE values dict) extends naturally.
- Sync-failure semantics deliberately preserve the prior FK (mirroring
  github_template_id's preserve-last-good pattern). The acceptance
  criterion "last_sync_queue_job_url remains null" is exact for newly-
  created bindings; for an existing binding whose prior sync succeeded
  and whose next enqueue then fails, the URL still points at the
  prior queue_job. If a future task wants the FK cleared on failure
  too, the failure handler in ``try_enqueue_dashboard_sync`` is the
  single place to extend.
- ``last_sync_queue_job_public_id`` is materialized via a left join
  on every read. If a future workload turns out to be read-heavy
  enough that the join is a measurable cost, the column could
  alternatively be denormalized onto the binding row (``last_sync_
  queue_job_public_id`` stored directly). The current shape — FK +
  join — is preferred because it keeps the public_id authoritative
  on queue_jobs and lets the SET NULL cascade naturally invalidate
  pruned URLs without a second write path.

Closes #277
The dashboard-template binding response previously exposed neither the
commit SHA of the most recent successful sync nor a clickable URL into
the GitHub browse UI. Add both fields to the response model. ``web_url``
is derived from the binding's source coordinates (always present once
a binding exists, regardless of sync state); ``commit_sha`` is
materialized via a new left join from
``dashboard_github_template_bindings`` to ``dashboard_github_templates``
on ``github_template_id`` and stays ``None`` until the first successful
sync links a content row.

Key decisions:
- ``build_github_browse_url`` lives in ``storage/github/web_url.py``
  next to ``tree_fetcher.py`` so all GitHub-URL knowledge stays in one
  place. The helper ``strip("/")`` normalises the root path: ``"/"``,
  ``""``, and ``"//"`` all collapse to a bare ``/tree/{ref}`` URL with
  no trailing slash, while ``"/themes/blue"``, ``"themes/blue"``, and
  ``"/themes/blue/"`` all produce the same ``…/tree/{ref}/themes/blue``
  output. Refs containing ``/`` (``release/1.0``) pass through
  verbatim — GitHub's browse UI accepts embedded slashes in the
  ``/tree/...`` route without URL-encoding, and encoding the slash
  would break the click-through.
- ``commit_sha`` is exposed via a left join rather than a denormalised
  column on the binding row. Mirrors the existing
  ``last_sync_queue_job_public_id`` join pattern: the read methods
  materialise the joined column in the same SELECT so the response
  layer builds the field without a second round-trip, and the SET NULL
  cascade on ``github_template_id`` naturally invalidates a stale SHA
  when a content row is pruned. The single read-side join helper
  ``_select_with_joins`` replaces the prior ``_select_with_queue_job``
  and adds the second outer join; ``_to_domain`` now takes both joined
  columns and uses ``model_copy(update=…)`` only when at least one is
  non-NULL so the existing
  ``test_get_by_id_returns_none_public_id_when_no_queue_job``-style
  invariants still hold.
- The domain model gains ``commit_sha: str | None`` with ``default=
  None`` so ``DashboardGitHubTemplateBinding.model_validate(row)`` on a
  raw ``SqlDashboardGitHubTemplateBinding`` (the
  ``BindingStore.create`` path, which does not go through the join
  helper) keeps working unchanged: the SQL row has no ``commit_sha``
  attribute, but the field's default fills the gap.
- ``DashboardTemplateBindingResponse.from_domain`` reads
  ``domain.commit_sha`` (not the raw SQL row) and computes ``web_url``
  by calling ``build_github_browse_url`` directly; both org-default
  and project-override responses go through the same ``from_domain``
  classmethod so neither caller needs special-casing. The admin-side
  force-sync handler returns its own response shape and does not
  serialise the binding, so it is unaffected.

Next-iteration notes:
- The follow-up rename / installation event processors in #240 / #242
  do not touch the read-side join — they only mutate display strings
  and reachability flags on existing rows. No further plumbing is
  needed for this slice.
- ``commit_sha`` is exact for the synced commit, not for any later
  ref movement. If a future task wants "last successful sync commit"
  to remain pinned even after a binding is rebound to a different
  ref, the commit_sha column on ``dashboard_github_templates`` is
  already keyed by ``(owner, repo, ref, root_path)`` so the join
  follows the binding's current coordinates, not its sync history.
  This matches the design doc's overwrite-in-place semantics.

Closes #281
The dashboard-template binding endpoints were not discoverable from the
parent ``Organization`` or ``Project`` resource. Add a
``dashboard_template_url`` HATEOAS field on both response models so a
client browsing the org/project resource can follow the link straight to
the binding configuration. The field is populated from the existing
``get_org_dashboard_template`` / ``get_project_dashboard_template`` route
names — no new plumbing.

Key decisions:
- Field lands on the client base ``Organization`` and ``Project`` models
  (``client/src/docverse/client/models/{organizations,projects}.py``)
  rather than on the user-facing handler subclass only. The acceptance
  criterion explicitly enumerates four test surfaces (org-list, org-get,
  project-list, project-get) and the only org-list endpoint is the
  super-admin ``GET /admin/orgs`` which uses the admin
  ``Organization`` subclass — also extending ``_OrganizationBase``.
  Adding the field on the base means both subclasses inherit it
  uniformly and the URL is always present (the route exists regardless
  of whether the binding has been configured), so the contract matches
  the issue's "the URL itself is stable" framing. Cost: the admin
  ``from_domain`` needs a one-line populate; benefit: a uniform
  navigation link from any org response.
- The URL is computed via ``request.url_for(...)`` exactly like the
  sibling ``services_url`` / ``credentials_url`` / ``editions_url``
  fields. Because all three call-sites (user-facing org, admin org,
  project) live in the same FastAPI app and the binding route names
  (``get_org_dashboard_template``, ``get_project_dashboard_template``)
  are already registered by ``handlers/orgs/dashboard_template.py``,
  no router-include changes were needed.
- The client base ``Organization`` model uses
  ``ConfigDict(from_attributes=True)`` (no ``extra="forbid"``), so
  adding a required field does not break clients that deserialize
  server responses — they pick up the new field. ``OrganizationCreate``
  / ``OrganizationUpdate`` (which *do* use ``extra="forbid"``) are
  request-only models and do not gain the field, so callers cannot
  accidentally smuggle a HATEOAS URL into a write payload. Same
  reasoning applies to ``Project`` vs. ``ProjectCreate`` /
  ``ProjectUpdate``.
- Existing handler tests for the four surfaces gained one assertion
  each (``dashboard_template_url`` ends with the expected path and
  starts with ``http``), keeping the new coverage co-located with
  the existing per-endpoint URL assertions instead of fragmenting
  into a separate test module.

Next-iteration notes:
- None. The field is plumbed through the three response sites that
  return an ``Organization`` or ``Project`` model. Future
  ``OrganizationServiceSummary`` or similar embedded representations
  do not need this field — the link belongs on the top-level resource.

Closes #282
Replace ``POST /admin/dashboard-templates/{binding_id}/sync`` with two
slug-keyed routes scoped under the org/project the binding belongs to:
``POST /orgs/{org}/dashboard-template/sync`` re-syncs the org-default
binding and ``POST /orgs/{org}/projects/{project}/dashboard-template/
sync`` re-syncs the project-override binding. Two motivations: the
public API must not expose database identifiers, and force-sync is a
routine operation that belongs with org admins, not super-admin only.

Key decisions:
- Both new handlers use the existing ``require_admin`` dependency.
  ``AuthorizationService.resolve_role()`` (services/authorization.py:31)
  already grants super-admins ``OrgRole.admin`` automatically, so a
  single ``require_admin`` covers "super-admin OR org-admin" without
  bespoke wiring — no second handler variant or dual-dep is needed.
  The slug-keyed URL already scopes the action to one org, so the
  super-admin escalation flows through the same code path the binding
  CRUD endpoints already use.
- The sync handlers do their own slug → binding resolution via
  ``DashboardTemplateBindingService.get_org_default`` /
  ``get_project_override`` (which raise ``NotFoundError`` → 404 when
  the binding is unconfigured), then hand the resulting binding id to
  ``DashboardSyncEnqueuer``. The double load (service + enqueuer) is
  intentional — keeping enqueue payloads keyed by binding id matches
  the worker contract and avoids leaking slug strings into the queue
  message — and the cost is one extra index hit per call.
- ``DashboardTemplateSyncEnqueuedResponse`` moved from
  ``handlers/admin/endpoints.py`` to ``handlers/orgs/models.py`` since
  the model is no longer admin-scoped. Added a
  ``from_queue_job(binding_id, queue_job, request)`` classmethod that
  mirrors the existing ``DashboardRebuildResponse.from_queue_job``
  shape so both POST handlers stay one line. The ``binding_id``
  field is preserved per the issue's "same response shape" acceptance
  criterion — the slug context is implicit in the URL but the integer
  id is still the authoritative pointer the worker logs against.
- The handlers attach to the same ``org_default_router`` /
  ``project_override_router`` split that #279 introduced, so the
  org-default sync POST inherits ``tags=["orgs"]`` and the project-
  override sync POST inherits ``tags=["projects"]`` with no per-route
  override needed. Matches the convention #279 established for the
  existing GET/PUT/DELETE endpoints on the same paths.
- The admin route + its tests are removed in the same change — the
  PRD has not shipped, so a deprecation alias would be churn for
  zero benefit. ``tests/handlers/admin_dashboard_template_test.py``
  is deleted and the equivalent coverage lands inline in
  ``tests/handlers/dashboard_template_test.py`` next to the binding
  CRUD tests it shares fixtures with.
- The OpenAPI tag test in ``test_dashboard_template_openapi_tags`` now
  also asserts the legacy admin path is gone (no
  ``/docverse/admin/dashboard-templates/{binding_id}/sync`` entry in
  ``paths``). Catches a future regression where a migration alias
  accidentally re-registers the old route.

Next-iteration notes:
- The response still carries the integer ``binding_id`` field. If a
  later slice wants the public API fully purged of database ids, the
  field can drop with no further plumbing — the URL identifies the
  binding unambiguously and the queue-job public id is a separate
  base32-encoded value already.
- ``try_enqueue_dashboard_sync`` (services/dashboard_templates/
  enqueue.py:92) is the fire-and-forget variant the binding PUT
  handlers use; the new sync handlers call ``enqueuer.enqueue``
  directly because a force-sync's failure should bubble as a 500 and
  surface in the response, not be silently logged like a PUT-time
  enqueue. That asymmetry is deliberate and matches the prior admin
  route's behaviour.

Closes #280
…dings

The push webhook silently ignored real GitHub events because the binding
lookup compared refs as exact strings. GitHub push payloads always carry
``ref = "refs/heads/<branch>"``, but the binding store kept whatever the
operator typed — so a binding stored as ``main`` never matched the
incoming ``refs/heads/main`` event. Normalize at both write and read
seams: a Pydantic ``BeforeValidator``-style ``field_validator(mode=
"before")`` on ``DashboardTemplateBindingCreate.github_ref`` strips a
leading ``refs/heads/`` or ``refs/tags/`` prefix on input, and the push
processor strips the same prefixes from ``payload["ref"]`` before the
binding store lookup. Both sites call one shared
``normalize_github_ref`` helper.

Key decisions:
- The shared helper ``normalize_github_ref`` lives in
  ``client/src/docverse/client/models/dashboard_template.py`` (next
  to the model whose validator uses it), not in
  ``services/dashboard_templates/`` as the issue's approach sketch
  proposed. The client package is published independently of the
  server and its dependencies are restricted to ``pydantic``,
  ``httpx``, and ``click`` — importing from
  ``docverse.services.dashboard_templates`` into a client model
  would invert the package boundary and pull the entire server
  graph into client distributions. The server's ``push_processor``
  already imports ``DashboardTemplateBindingCreate`` from
  ``docverse.client.models``, so importing the helper from the same
  module is a no-cost extension. Mirror of the
  ``normalize_base_domain`` pattern in
  ``client/.../models/organizations.py``.
- ``mode="before"`` on the validator runs the prefix-strip *before*
  Pydantic applies the ``min_length=1`` / ``max_length=256``
  constraints from the ``Field(...)`` annotation, so
  ``refs/heads/`` alone (a fully-qualified ref with an empty branch
  name) normalizes to ``""`` and then fails ``min_length=1`` with
  the standard pydantic error message. Same for ``refs/tags/``.
  Length bounds stay 1–256.
- The push processor now logs ``"No bindings match push event"`` at
  ``info`` (was ``debug``) and includes both the raw and the
  normalized refs (``github_ref_raw`` + ``github_ref``). When a
  future operator hits a mismatch — different repo, mismatched
  branch, etc. — the ref pair shows up at the default verbosity
  instead of requiring a logging-level dial-up. The other two
  push-processor info lines stay on raw-ref logging because by the
  time they fire the lookup has already succeeded; the
  diagnostically interesting comparison is the no-match case.
- Existing test fixtures that seeded bindings with
  ``github_ref="refs/heads/main"`` via the storage-level
  ``DashboardGitHubTemplateBindingCreate`` dataclass switch to the
  bare ``"main"`` form. Those storage-direct creates bypass the
  Pydantic validator, so they were testing a state that cannot
  exist after the validator: in production every PUT runs through
  the API, every API write normalizes, and stored refs are always
  canonical. Updating the fixtures to canonical form keeps them
  faithful to production data and avoids adding a third
  normalization site at the storage layer.
- The two new ``test_process_matches_bare_*_against_refs_*_payload``
  tests in ``push_processor_test.py`` are the asymmetric coverage
  the prior fixtures didn't have: every existing fixture used
  ``refs/heads/main`` on both sides, which is why the bug shipped
  without a failing test. A handler-level mirror in
  ``webhooks_github_test.py`` covers the same path through the
  HMAC-validated webhook seam, and a parametrized round-trip test
  in ``dashboard_template_test.py`` exercises both ``refs/heads/``
  and ``refs/tags/`` from the org-admin PUT side, asserting that
  the response body shows the bare form and that a re-PUT with the
  bare form is idempotent.
- The validator unit test in ``client/tests/`` covers the five
  cases the issue's acceptance criterion enumerates: bare branch,
  ``refs/heads/<branch>``, ``refs/tags/<tag>``, a 40-char hex SHA
  (must pass through untouched), and the empty string (must still
  fail ``min_length``). Plus two extras: a ref with an embedded
  slash (``release/1.0`` and ``refs/tags/release/1.0``) to lock in
  that only the leading prefix is stripped, and a non-tracked ref
  type (``refs/pull/123/head``) to confirm the strip list is
  bounded to the two prefixes operators care about.

Next-iteration notes:
- No data backfill is needed. This branch is unmerged and the only
  known production binding (jsc-test-20260409) was already in bare
  form; any future writes go through the new validator. If a
  prefixed row ever does sneak in via a pre-validator code path,
  it will start matching as soon as the webhook side normalizes
  the lookup ref, but the canonical-form invariant on the row
  itself would be off — a future task could add a one-shot
  migration that rewrites any stored ``refs/heads/...`` /
  ``refs/tags/...`` rows to bare form for tidiness.
- The ``DashboardGitHubTemplateBindingCreate`` storage-level
  dataclass intentionally does *not* normalize. Its only callers
  are the binding service (which forwards a validated
  ``DashboardTemplateBindingCreate``) and tests that bypass the
  API. Adding a normalize at this layer would be belt-and-
  suspenders, but it would also let test fixtures lie about what
  production actually stores. If a future direct-storage code path
  ever lands (e.g. a CLI seeder), normalizing in
  ``binding_store.create`` is the right place to add it — calling
  the same shared helper.
- The other two push-processor ``info`` log lines (``"Push event
  affected no binding root_paths"`` and ``"Enqueued
  dashboard_sync jobs from push event"``) still log
  ``github_ref=ref`` (the raw value). They fire after the lookup
  has already matched, so the raw value is enough; switching them
  to also include ``github_ref_raw`` would be log-line churn for
  no diagnostic gain.

Closes #283
@jonathansick jonathansick merged commit be3c5fd into main May 1, 2026
22 checks passed
@jonathansick jonathansick deleted the tickets/DM-54689-github-webhook branch May 1, 2026 19:01
jonathansick added a commit that referenced this pull request May 1, 2026
Capture GitHub's stable repo / owner / installation IDs on first
successful sync, write them to bindings + content rows, and have the
push event processor prefer ID-keyed binding lookup with a name-based
fallback for un-synced rows. Internal matching now survives a GitHub
repo or org rename without operator intervention; the public API
remains name-keyed and exposes the captured IDs in the binding GET
response for debugging.

Key decisions:
- ``InstallationAuth`` grew a required ``installation_id: int`` field
  rather than a separate ``get_installation_id`` round-trip from the
  syncer. The app client already resolves the installation id inside
  ``get_installation_auth`` and was discarding it; threading it onto
  the auth record is a no-cost change for production callers and lets
  the syncer write ``github_installation_id`` to the binding without
  doubling the GitHub round-trip count. The dataclass field is
  required (no default) so production code cannot accidentally
  construct an auth record without it; the test fixture's
  ``installation_auth`` helper sources it from the mock's seeded
  ``(owner, repo) -> id`` map (or ``default_installation_id``) so
  existing direct-construction tests keep their ergonomic shape.
- The tree fetcher gained a ``_fetch_repo`` step that calls
  ``GET /repos/{owner}/{repo}`` rather than overloading the existing
  ``commits/{ref}`` call, because the commits response does not
  include the repository's numeric ID and adding the field to its
  schema is GitHub's prerogative. Repo + owner IDs land on
  ``FetchedTree`` as required ``int`` fields rather than
  ``int | None`` because the syncer is the only consumer in this
  slice and it always wants both — leaving them ``Optional`` would
  have leaked ``None`` into the store-layer guard for no benefit.
  The ``raise_for_status`` chain catches a missing repo deterministi-
  cally, so a real GitHub anomaly (e.g. installation lost mid-sync)
  surfaces as a recorded sync failure rather than a silent no-id
  capture.
- ``DashboardGitHubTemplateBindingStore`` exposes two new methods
  rather than re-purposing ``list_by_repo_ref``:
  ``list_by_repo_id_and_ref`` for the ID-keyed primary lookup
  (backed by ``idx_dashboard_github_template_bindings_repo_id_ref``
  from #240) and ``list_unsynced_by_repo_ref`` whose
  ``github_repo_id IS NULL`` filter encodes the rename-safety
  invariant directly. Without that filter, a binding whose name
  coincidentally matches a *different* repo now sitting at the old
  string would re-sync against the wrong upstream; pushing the
  filter into the store query (vs. dedup-after-the-fact) keeps the
  invariant load-bearing in code rather than only in tests.
  ``list_by_repo_ref`` stays in the API surface even though the
  push processor no longer calls it, because the binding store is
  a public storage primitive and external callers (admin tooling,
  future debug consoles) may legitimately want the unfiltered
  name-only view.
- ``PushEventProcessor._lookup_bindings`` unions the two store calls
  with a ``seen: set[int]`` dedup guard. The two queries are
  disjoint by construction (the unsynced filter excludes IDs the
  primary lookup might find), so the dedup is defense-in-depth — but
  the ``test_process_dedupes_id_and_name_matches`` test pins the
  guard so a future loosening of either query (e.g. dropping the
  ``IS NULL`` filter "for symmetry") cannot regress to double-
  enqueueing the same ``dashboard_sync`` job. ``_coerce_int`` rejects
  ``bool`` explicitly because ``isinstance(True, int)`` is ``True``
  in Python and a malformed payload that sent ``"id": true`` would
  otherwise leak ``1`` through as a real repo id.
- The store-layer "only assign if non-None" guard on
  ``update_sync_state`` and ``upsert`` (already pinned by tests in
  #240) is what protects populated IDs from being overwritten on a
  failed re-sync. The syncer now always passes populated IDs on the
  success path, so the guard's job is to fence the *failure* path
  (where ``_record_failure`` calls ``update_sync_state`` without ID
  kwargs). The new ``test_sync_does_not_overwrite_populated_ids_
  on_resync`` test pins the contract end-to-end so a future syncer
  refactor that started passing ``None`` explicitly cannot zero a
  binding's IDs.
- ``GitHubMock.seed_tree`` now auto-seeds the
  ``GET /repos/{owner}/{repo}`` endpoint with default ``(repo_id=1,
  owner_id=1)`` so existing tree-fetcher tests that do not care
  about ID capture keep working with a single ``seed_tree(...)``
  call. Tests that round-trip captured IDs into the database pass
  ``repo_id=`` / ``owner_id=`` to ``seed_tree`` and the auto-call
  uses those instead. ``seed_repo`` is exposed separately for tests
  that bypass ``seed_tree`` (the syncer's GitHub-error fixture
  needs the repo endpoint mocked but the ref endpoint to return
  404, so it calls ``seed_repo`` then mocks ``commits/{ref}``
  directly).

Next-iteration notes:
- Slice #242 ("rename + installation webhooks") will consume the
  ``github_owner_id`` / ``github_repo_id`` columns this slice now
  populates. When a ``repository.renamed`` event arrives, the
  handler can match the binding by stable ID (the rewrite target)
  and update ``github_owner`` / ``github_repo`` strings without
  re-syncing content. This slice intentionally does not register
  any rename-event handlers — name-keyed pushes still match
  un-synced bindings and ID-keyed pushes still match synced ones,
  so the system is functional without rename handlers; they're a
  staleness-fix, not a correctness requirement.
- The blocker merge of ``tickets/DM-54689-github-webhook`` (PR
  #264) into this branch is a no-op once #264 lands on main first.
  GitHub auto-merge will flatten the merge commit. If #264 is
  rebased instead of merged cleanly, this branch needs a manual
  re-merge.
- ``InstallationAuth.installation_id`` is now required, but the
  GitHub App secrets remain optional on ``Factory`` /
  ``WorkerFactoryBuilder``. The optionality is a feature-flag for
  the dashboard-template-from-github feature; once the feature
  graduates from optional, the three GitHub App fields can be
  tightened to non-Optional and ``startup`` raised earlier — the
  ``InstallationAuth`` change has no bearing on that decision.
- The ``DashboardTemplateBinding`` client model now exposes the
  three numeric ID fields as informational. If a future client
  wants to assert on rename-robustness from the outside, it can
  observe ``github_repo_id`` flipping from ``null`` to populated
  after the first successful sync — no new endpoint needed.

Closes #241
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment