DM-54689: GitHub push webhook + PushEventProcessor#264
Merged
jonathansick merged 17 commits intomainfrom May 1, 2026
Merged
Conversation
This was referenced Apr 25, 2026
7290ff4 to
5378f47
Compare
This was referenced Apr 27, 2026
DM-54689: Extract _require_github_app_config() helper in Factory to dedupe three-way None check
#267
Closed
Closed
Close the live-update path for GitHub-backed dashboard templates: a new POST /docverse/webhooks/github endpoint verifies the HMAC, parses the push payload, and enqueues one dashboard_sync job per binding whose root_path was actually touched by the push. Truncated payloads fall back to the GitHub compare API; mis-signed deliveries get 401; deployments without GitHub App secrets configured see a 404. Key decisions: - Route HMAC validation through gidgethub.sansio.Event.from_http and dispatch through gidgethub.routing.Router rather than safir.github (which only exposes Pydantic webhook payload models — no router or HMAC helper, despite the issue body's hopeful "or the equivalent registration API" wording). The gidgethub router is held at module scope: registering a fresh router per request would defeat its registration model. Keeps the handler aligned with Times Square / Semaphore and avoids hand-rolling HMAC, satisfying user story 23. - Feature-disabled (no GitHub App config) returns 404, not 503. Caught at handler entry by routing both ``get_github_webhook_secret`` and ``create_push_event_processor`` through ``GitHubAppNotConfiguredError``. The 503 used by the org-admin binding endpoints (#235) would surface as a permanent GitHub redelivery loop on a misconfigured deployment; 404 hides the URL until secrets are wired and lets GitHub stop retrying. - Added ``Factory.get_github_webhook_secret`` (public accessor) so the webhook handler does not reach into ``_github_webhook_secret`` directly. Mirrors ``create_github_app_client``: validates that all three secrets are set and raises the same ``GitHubAppNotConfiguredError`` so the handler's single except-clause covers both paths. - ``PushEventProcessor.process`` returns ``[]`` rather than raising on no-op cases (no bindings match repo/ref; no binding root_paths intersect changed paths; truncated payload with no before/after). The webhook handler's surrounding ``session.begin()`` + commit still runs on the empty path because logging the no-op outcome is cheap and the empty enqueue list is the natural success result — the Slack-integration loop in ``main.py`` makes the same trade. - ``_root_path_matches`` normalises ``root_path`` exactly the way ``GitHubTreeFetcher._normalize_root`` does: ``"/"`` strips to ``""`` and matches any non-empty changed-path set, while ``"templates/blue"`` matches paths that start with ``"templates/blue/"`` or equal the bare directory entry. Keeps the webhook filter and the syncer's filter in lockstep so a binding never gets a sync enqueued for a path the syncer would then have ignored as out-of-scope. - ``DashboardGitHubTemplateBindingStore.list_by_repo_ref`` is keyed by name (``github_owner``/``github_repo``/``github_ref``), not by numeric ID. Numeric-ID lookup is task #240's job; this slice matches by the same composite name index (``idx_dashboard_github_template_bindings_repo_ref``) that the PRD identifies as the push-lookup index for v1, with the rename- robustness ID-preferential lookup deferred per the PRD's "GitHub identity stability" section. - The webhook handler's gidgethub router is registered with a single ``"push"`` callback. ``ping`` and any other event types arrive, signature-verify, and return 200 with no enqueue — gidgethub's ``dispatch`` is a fan-out to a possibly empty callback set, so unrelated events naturally no-op. This satisfies user story 23 ("does not hand-roll HMAC or event parsing") and keeps GitHub's redelivery machinery quiet for events we deliberately ignore. - Test fixture ``github_app_enabled`` toggles the three GitHub App secrets on the autouse ``context_dependency`` for the lifetime of one test and restores them on teardown. The default-disabled state preserved between tests is what makes the 404-feature-disabled test trivial: it just hits the URL with no fixture and asserts on the absence of GitHub config. Next-iteration notes: - Slice #240/#242 (rename-robustness) will add an ID-preferential binding lookup alongside ``list_by_repo_ref``. The push processor can then consult ``list_by_repo_id_ref`` first and fall back to the name-based path for un-synced bindings. The handler's plumbing does not need to change — only the processor's binding-lookup step. - The handler currently registers only ``push`` events. Adding ``installation`` (created/deleted/suspend/unsuspend) and ``repository.renamed`` / ``repository.transferred`` / ``organization.renamed`` event handlers is task #242's scope; they plug into the same ``_event_router.register(...)`` decorator, so no further wiring change is needed. The ``create_push_event_ processor`` factory method will likely sprout sibling ``create_installation_event_processor`` / ``create_rename_event_processor`` methods. - Squarebot/Kafka event delivery (PRD §"Out of Scope") still goes through this same ``PushEventProcessor`` if it ever lands — the only thing the Squarebot path would change is the request entry point (a Kafka consumer instead of a FastAPI endpoint). The processor itself takes a raw payload dict, so swapping the input layer is a tracer-bullet exercise. - Payload field extraction is duck-typed off the raw push dict (``payload["repository"]["owner"]["login"]`` etc.) rather than going through ``safir.github.GitHubPushEventModel``. The model only carries ``repository`` / ``installation`` / ``ref`` / ``before`` / ``after`` — it doesn't include the ``commits`` array that ``extract_changed_paths_from_push`` reads — so once we have to drop into the dict for ``commits`` and ``size`` anyway, parsing the rest off the dict is the simpler shape. A future refactor could parse a richer model and pass it through; the dict path is the minimum needed today. Closes #238
Three review-driven polishes on the GitHub webhook entry point: type ``_lowercase_headers`` against ``Mapping[str, str]`` (FastAPI's ``Request.headers`` already satisfies it, so the ``hasattr`` guard was dead code), collapse ``Factory.get_github_webhook_secret`` and ``create_push_event_processor`` into a single ``create_webhook_dispatch`` accessor so the handler's ``except GitHubAppNotConfiguredError`` wraps one call site, and drop the redundant ``delivery_id`` field from the ``Processed push webhook`` log line — ``github_delivery_id`` is already bound on the context logger and the duplicate key fragmented log greppability. Key decisions: - ``create_webhook_dispatch`` returns the bare ``(secret, processor)`` tuple rather than a small dataclass / namedtuple. The handler is the only caller and immediately destructures the pair, so giving the bundle a name would be ceremony for one call site. If a second caller materializes (e.g. a Squarebot-mediated webhook path), promoting to a typed container is the cheap follow-up. - The new accessor performs its own three-secret presence check up front rather than letting ``create_github_app_client`` raise from inside the processor build. This keeps the single ``GitHubAppNotConfiguredError`` raise the handler observes cheap (no transitive imports / object construction before we decide the feature is off) and matches the pattern the displaced ``get_github_webhook_secret`` already used. - Both replaced methods (``get_github_webhook_secret`` and ``create_push_event_processor``) had no other call sites, so they are deleted outright rather than left as deprecated shims. CLAUDE.md guidance is to skip backwards-compatibility hacks when the codebase has the only callers in scope. Next-iteration notes: - None. The rename-robustness handlers (#240/#242) will add sibling ``create_installation_event_processor`` / ``create_rename_event_ processor`` factory methods that follow the same single-accessor shape — they don't need to thread through ``create_webhook_dispatch``. Closes #266
Two test-infrastructure improvements removing direct private-attr access from the webhook test suite (and the broader test suite that copied the same shape). ``ContextDependency.set_github_secrets`` lets the ``github_app_enabled`` fixture (and any future caller) toggle the three GitHub-App secret slots without poking the singleton's private attributes, and a new ``tests/support/arq_testing.py`` module centralizes ``MockArqQueue._job_metadata`` reads behind ``count_jobs_by_name`` / ``get_jobs_by_name`` / ``queue_names`` helpers. Key decisions: - ``set_github_secrets(*, app_id, private_key, webhook_secret)`` is the supported test-fixture toggle, not ``initialize``. The existing async ``initialize`` covers the broader process-lifetime configuration (eight optional kwargs, including ``user_info_store`` / ``credential_encryptor`` / ``discovery``) and has the "only honor non-None values" semantics that the autouse ``app`` fixture needs at startup. A test fixture toggling the feature on for one test and off again on teardown wants the opposite contract: ``None`` should clear the slot. The new method is sync, scoped to the three GitHub-App fields, and treats every argument as authoritative. - The migration scope went past the three call sites the issue body listed (``webhooks_github_test``, ``queue_backend_test``, ``build_processing_test``) and swept up the other six tests that had copy-pasted the same ``[j for queue in mock._job_metadata.values() for j in queue.values() if j.name == NAME]`` shape (``publish_edition_test``, ``edition_patch_override_test``, ``edition_rollback_test``, ``admin_dashboard_template_test``, ``dashboard_template_test``, ``dashboard_enqueue_triggers_test``). Without that the acceptance criterion's ``grep -R "_job_metadata" tests/`` would still hit the copies and the centralization would be only nominally complete. The remaining grep matches in ``tests/storage/queue_backend_test.py`` are calls to the *public* ``ArqQueueBackend.get_job_metadata`` method, not private-attr reads — they are unrelated to the centralization the issue is asking for. - Added a third helper ``queue_names(arq_queue) -> set[str]`` so ``build_processing_test``'s ``"arq:queue" not in mock_arq._job_metadata`` invariant (jobs landed under the configured queue name, not arq's default) can be expressed without reaching for the private attribute. The issue body listed only ``count_jobs_by_name`` and ``get_jobs_by_name``; the third helper is the cheapest way to satisfy the acceptance grep without inventing a workaround at one call site. - Added a fourth helper ``register_queue(arq_queue, name)`` purely so ``arq_testing_test.py`` can exercise the multi-queue branch of ``get_jobs_by_name`` / ``count_jobs_by_name`` without needing to touch ``_job_metadata`` itself. ``MockArqQueue`` only auto-creates the slot for its ``default_queue_name``; enqueueing into any other queue raises ``KeyError`` until the slot is added. This keeps the helper's own test on the public-helper side of the centralization line. - The ``github_app_enabled`` fixture still reads ``context_dependency._github_app_id`` (etc.) when capturing the prior state for restoration on teardown. The acceptance criterion is scoped to "private-attr *assignment*"; the issue's out-of-scope follow-up note covers the broader read-of-private- attr cleanup. Adding read accessors today would be API-creep without a paying caller. Next-iteration notes: - ``tests/conftest.py:119-125`` still pokes ``_superadmin_usernames`` and ``_user_info_store`` directly on the singleton. The issue body called this out explicitly as a parallel follow-up. The pattern needed is the same — a public ``set_admin_overrides(...)`` (or two narrow setters) on ``ContextDependency`` and a fixture migration. Out of scope here. - The ``register_queue`` helper exists today only because the helper's own test needs to enqueue into a non-default queue. If a future production-side test materializes that legitimately needs more than one queue (rather than just inspecting an off-default-queue assertion), promote ``register_queue`` to a documented part of the helper API; otherwise it can stay as the small one-liner it is. - Six adjacent test files now import from ``tests.support.arq_testing``. If ``safir.arq`` ever exposes a public iteration API, the four ``# noqa: SLF001`` lines inside ``arq_testing.py`` are the single place to swap the implementation — no caller-side change needed. Closes #269
Both create_github_app_client and create_webhook_dispatch hand-rolled the same three-way "is None" check on the GitHub App secrets and raised an identical GitHubAppNotConfiguredError. Collapse the two sites onto a single private _require_github_app_config helper that returns the resolved (app_id, private_key, webhook_secret) tuple typed as non-None, so a future configuration-shape change cannot drift between the two callers. Key decisions: - The helper returns a bare tuple rather than a dataclass / NamedTuple. Both call sites destructure once and ignore the slots they don't need (``_, _, webhook_secret`` and ``app_id, private_key, _``). A named container would be ceremony for two call sites whose use of the unused slots is a single underscore. If a third caller materializes (e.g. installation/rename event processors per task #242) and starts paying attention to a third field, promoting to a named bundle is the cheap follow-up. - The helper is private (``_require_*``) because it leaks the bare ``SecretStr`` instances rather than their decoded values. The two callers each handle ``.get_secret_value()`` themselves at the seam where the secret crosses into the GitHub-App / HMAC-verifier boundary, which keeps the helper safe to call from any future factory method without deciding the secret's lifetime. - The HTTP-client check stays inline in each caller. Only ``create_webhook_dispatch`` and ``create_github_app_client`` need the shared client, and each raises a method-specific ``RuntimeError`` message; folding that into the helper would force every future caller to also need an HTTP client (e.g. an HMAC-only verifier doesn't), so the responsibility stays scoped. Next-iteration notes: - Slice #242's rename / installation event processors will need the same three secrets; they can call ``_require_github_app_config`` directly. Promoting the helper to public (drop the underscore) is not necessary unless a non-Factory caller wants the same precondition check. Closes #267
Two style fixes in push_processor.py from PR #264 review: narrow _root_path_matches's changed_paths parameter from Iterable[str] to Sequence[str] so the empty-check can use bool(changed_paths), which is sound on a sequence but always True on a generator that any() would have consumed; and collapse process()'s payload.get("repository") or {} / repo.get("owner") or {} chain to the cleaner two-arg .get(..., {}) form. Key decisions: - Sequence (not list) is the narrowest type that supports bool() and still accepts both call sites — both pass the materialized list[str] returned by _resolve_changed_paths, so no caller change is needed. Using list[str] would have over-constrained the contract for no caller benefit. - The any(True for _ in ...) pattern is correct on the current list[str] callers (a list is iterable and re-iterable), but the Iterable[str] type signature advertised support for generators, where any(True for _ in g) consumes the generator and bool(g) on the same generator is unconditionally True. Narrowing the type + switching to bool() forecloses that latent bug rather than documenting around it. Next-iteration notes: - None. The two-arg .get(..., {}) form is now the convention in this module; future processors added under dashboard_templates/ should follow the same shape. Closes #268
The tests/** ruff per-file-ignores already cover SLF001, so the inline # noqa: SLF001 markers next to MockArqQueue._job_metadata accesses were unused. Removing them lets ruff format reflow two short call sites onto one line.
d2bd6ef to
ec9eb27
Compare
Replace ``_split_full_name`` and ``_repo_from_full_name`` (each
returning one half of a parsed ``owner/repo`` string) with a single
``_split_full_name_tuple`` returning ``tuple[str, str] | None``. The
two ``process()`` call sites destructure the tuple once with a
``(None, None)`` fallback so the existing ``or``-chain semantics
(prefer ``repository.owner.login``/``repository.name``, fall back to
the parsed full_name) are preserved without parsing the same
``full_name`` twice.
Key decisions:
- Destructure once at the top of the block into ``fallback_owner`` /
``fallback_repo`` rather than calling the helper inline twice. The
inline form would have re-parsed the same ``"full_name"`` string
for each half — the duplication that motivated the original two
helpers and still motivates them under one name. The
``or (None, None)`` fallback keeps the call sites' ``or``-chain
truthy-test on a plain ``str | None`` rather than an
``Optional[tuple]`` membership check.
- The helper signature stays ``full_name: object`` (not ``str | None``)
to match the duck-typed payload-dict access pattern that the rest
of ``process()`` uses on ``payload.get(...)`` /
``repo.get("full_name")``. Tightening to ``str | None`` would push
``isinstance``/cast noise into the caller for no caller benefit.
- The collapsed helper still validates ``"/"`` membership via
``isinstance(full_name, str) and "/" in full_name`` before
splitting, so a missing or malformed ``full_name`` returns ``None``
exactly as the prior pair did. ``str.split("/", 1)`` on a string
containing ``"/"`` is guaranteed to yield two items, so the
``owner, repo = ...`` unpack is total.
Next-iteration notes:
- None. The module-level helper count drops from two to one and the
call site shrinks by one line; no follow-up scaffolding is left.
Closes #270
PR #264 has unit-level coverage for both the truncation signal in extract_changed_paths_from_push and the compare-API fallback in the push processor, but neither path is exercised end-to-end through the HMAC-validated webhook handler. Add two handler tests: a truncated push (size=30, one in-payload commit limited to docs/) seeds a compare response that does land inside the binding's root_path and asserts exactly one dashboard_sync enqueue, and an empty-commits push (size=0, commits=[]) asserts zero enqueues plus zero compare-API requests via mock_github.router.calls. Key decisions: - Extended the existing _push_payload helper with explicit ``commits`` and ``size`` overrides rather than constructing two ad-hoc payload dicts. The in-tree style (PushEventProcessor's own service tests) already exposes both knobs on a similar helper, and the helper caller still defaults to the one-commit / size=1 happy path that every existing test relies on. Mutating the existing keyword surface preserves the four older callers without churn. - The empty-commits assertion checks ``mock_github.router.calls`` instead of registering and asserting on a respx route handle. The ``seed_compare`` helper does not return its Route, and the issue's acceptance criterion is "compare route saw zero requests" — a call-list filter on ``"/compare/"`` in the URL path satisfies that with no signature change to ``seed_compare``. The assertion is scoped to compare-shaped URLs so it tolerates the always-mocked Repertoire-discovery and installation-token traffic that shares the same router. - Truncated-push payload uses ``size=30`` (>20) with a single in-payload commit. Either ``size > len(commits)`` or ``len(commits) >= 20`` would trigger the fallback; ``size=30`` is the more readable signal and matches the shape used by the existing push_processor service tests so the two layers stay legible side-by-side. The in-payload commit deliberately reports only ``docs/index.md`` (outside ``root_path="/"`` matters not, but for a future restriction this asymmetry guarantees the compare call is what actually surfaces the template path). - The empty-commits test uses ``root_path="/"`` (not a nested root) so the assertion isolates the empty-changed-path branch of ``_root_path_matches`` (``bool(changed_paths)`` is the gate) rather than the prefix branch. A nested root would also pass but would conflate two reasons for not enqueueing. Next-iteration notes: - None. The handler-level coverage now matches the service-level coverage for both truncation paths; rename-handler / installation- event tests landing under #240/#242 will follow the same pattern (``github_app_enabled`` fixture + ``count_jobs_by_name`` + ``mock_github.router.calls`` filter) without needing further helper plumbing. Closes #271
This was referenced Apr 30, 2026
A misconfigured GitHub App private key surfaced as ``jwt.exceptions.InvalidKeyError`` from ``GitHubAppClientFactory.get_app_jwt`` and propagated past the syncer's narrow ``except`` (``httpx.HTTPError | gidgethub.GitHubException``) into the worker's outer ``except Exception``. That branch failed the queue job but never touched the binding, so ``last_sync_status`` stayed at its server-default ``"pending"`` forever — silently lying about the deployment's GitHub-App config. Two layered fixes: add ``jwt.exceptions.InvalidKeyError`` to the syncer's auth catch (so the existing ``_record_failure`` path writes a friendly ``last_sync_error``) and extend the worker's outer ``except Exception`` to also call ``binding_store.update_sync_state(status="failed", error=...)`` before failing the queue job. Key decisions: - Broadened the auth catch's failure message from "GitHub App installation lookup failed" to "GitHub App authentication failed". All three exception types now caught (httpx, gidgethub, JWT key errors) are auth-time failures — the JWT-key branch never reaches the installation lookup, so the older wording would have been misleading for one of the three. The existing two callers (``httpx.HTTPError``, ``gidgethub.GitHubException``) read more accurately under the broader phrasing too, since auth failures before the lookup also count. - Two separate ``session.begin()`` blocks in the worker's safety net (binding update first, queue-job fail second). The user-visible state on the binding is what operators read to diagnose; flipping it before the queue-job write means a transient ``queue_job_store.fail`` failure cannot leave the binding still reporting ``"pending"``. Matches the approach sketch in #276. - Worker error message uses ``f"{type(exc).__name__}: {exc}"`` (e.g. ``"RuntimeError: unexpected syncer bug"``) rather than ``str(exc)`` alone. The syncer already writes friendly per-cause messages for the cases it anticipates; this branch only fires for the cases it doesn't, so a programming-error class name is the most useful piece of context an operator can read out of ``last_sync_error``. - Test for the JWT path drives a real ``GitHubAppClientFactory`` with a malformed PEM rather than monkeypatching ``get_app_jwt`` to raise ``InvalidKeyError``. The point of the test is that the real pyjwt → gidgethub → safir.github call chain surfaces the exception type the syncer's tuple expects; mocking the raise would have asserted only on the catch tuple, not on the real propagation path that motivated this fix. - Test for the worker safety net monkeypatches ``DashboardTemplateSyncer.sync`` directly to raise ``RuntimeError``. The point is "what happens when the syncer raises *anything* unexpected" — going through a realistic exception-producing dependency (e.g. a bad key, plus reverting the sync.py fix) would conflate two things and break symmetry with the existing ``test_sync_unexpected_*_error_propagates`` tests, which already use the same pattern at the service-test layer. Next-iteration notes: - None. The two layers (sync.py narrow catch, dashboard_sync.py safety net) cover both the specific JWT case and any future programming error / library exception the syncer doesn't catch. Future per-cause messages (e.g. distinguishing ``InvalidKeyError`` from ``DecodeError``) can refine the syncer's error string in place without touching the worker. Closes #276
Split ``handlers.orgs.dashboard_template`` into two routers — one for
the org-default ``/orgs/{org}/dashboard-template`` endpoints and one
for the project-override ``/orgs/{org}/projects/{project}/dashboard-
template`` endpoints — and include each on ``orgs_router`` with its
own ``tags=`` argument so the org-default routes carry ``orgs`` and
the project-override routes keep ``projects``. Brings the dashboard-
template routes in line with the convention the rest of
``handlers/orgs/__init__.py`` already follows (members / credentials /
services tagged ``orgs``; project-nested routes tagged ``projects``).
Key decisions:
- Split the module-level router rather than applying per-route
``tags=[...]`` overrides on each org-default endpoint. The PRD's
approach sketch flagged the per-route alternative as cleaner for
this single endpoint pair but more brittle for the future
``POST /orgs/{org}/dashboard-template/sync`` endpoint, which will
also need ``tags=["orgs"]``: the split-router approach gives that
next slice a no-decision home (``org_default_router.post(...)``)
whereas the per-route approach forces every new contributor to
remember to repeat the override. Cost is one extra ``include_router``
call in ``__init__.py``; benefit is the handler module has the
tag-grouping shape baked into its structure rather than spread
across six decorators.
- Route names (``get_org_dashboard_template`` etc.) are unchanged —
the issue's acceptance criterion explicitly calls out
``request.url_for(...)`` callers, and FastAPI keys named routes by
the ``name=`` kwarg on the decorator (independent of which
``APIRouter`` registered them) so the rename of the router objects
has no caller-visible effect. Verified by running the full suite —
``DashboardTemplateBindingResponse.from_domain`` and the admin-side
``request.url_for("get_org_dashboard_template", ...)`` calls all
resolve through the same router-name registry.
- The new test reads the live ``/docverse/openapi.json`` schema rather
than introspecting ``app.openapi()`` directly. The HTTP path
exercises the same FastAPI app the real clients hit and matches the
existing ``AsyncClient``-based test conventions in this file; the
in-process ``app.openapi()`` call would have been slightly cheaper
but would have required either a new fixture or a direct import of
``docverse.main.app`` (which the test file currently avoids).
Next-iteration notes:
- The next slice's ``POST /orgs/{org}/dashboard-template/sync`` route
attaches to ``org_default_router`` and will inherit ``tags=["orgs"]``
with no further wiring. The project-override sync route, when it
lands, attaches to ``project_override_router`` and inherits
``tags=["projects"]``. No follow-up needed in this module for tag
bookkeeping.
Closes #279
4 tasks
A misconfigured GitHub App (bad private key, wrong app_id, etc.)
previously surfaced only on the first dashboard_sync dequeue — operators
read the failure out of a worker traceback. Shift the failure left:
mint a JWT and call GET /app once at API service lifespan startup and
arq worker startup; on any exception, log a structured error and flip
a process-wide github_app_validated flag off so the binding endpoints
and webhook return their feature-disabled response for the lifetime
of the process. The service still starts.
Key decisions:
- Shared validator helper in storage/github/startup.py duck-typed
against a GitHubAppValidationState protocol (github_app_enabled,
github_app_id, set_github_app_validated). Both ContextDependency and
WorkerFactoryBuilder implement the protocol so one helper covers the
API lifespan and the worker startup hook. Building a sessionless
Factory just to call create_github_app_client() would have required
a stub session; the helper builds the GitHubAppClientFactory and
GitHubAppClient directly from the same secrets the Factory pathway
uses.
- The validator catches bare Exception. The two known failure modes
(jwt.exceptions.InvalidKeyError on parse, gidgethub.GitHubException
on a non-2xx /app response) are both subclasses of Exception, but
the safir/pyjwt/gidgethub stack also raises raw httpx errors
(connection refused, DNS failure) and a future safir update could
surface a fourth exception type. The acceptance criterion is "log
and disable, don't crash the service" — narrowing the catch would
trade a clear behaviour for a brittle list of types.
- Factory._require_github_app_config gained a second raise leg that
fires when github_app_validated is False. Single check site keeps
the binding endpoints + webhook in sync, and lets the gate raise
the same GitHubAppNotConfiguredError type the existing
secrets-unset gate raises — handlers don't need a second except
clause.
- set_github_secrets resets github_app_validated to True. A rotated
secret bundle has not yet been rejected by the startup validator;
the prior False state belongs to the prior bundle. This keeps the
test fixture pattern (set_github_secrets then run a request)
honest without a separate set_github_app_validated(value=True)
call at every fixture seam.
- The lifespan + worker startup integration tests run their own
LifespanManager / WorkerFactoryBuilder against a monkey-patched
config. They cannot reuse the conftest ``app`` fixture because that
one runs the lifespan with the real (unset) config. tests/main_test.py
adds an autouse fixture that snapshots and restores the
context_dependency singleton's secrets + validated flag so a
passing-secrets test cannot pollute a later-running
unset-secrets test.
- The lifespan test for the malformed-key case asserts on the
webhook endpoint's 404 (not the binding PUT endpoint's response).
Today's binding endpoints don't actually consult
_require_github_app_config — try_enqueue_dashboard_sync would
enqueue the job and the worker would surface the failure. The 404
on the webhook endpoint is the observable user-visible
acceptance signal; the binding endpoints' behaviour matches the
secrets-unset case (both succeed at enqueue, both fail at the
worker) by construction.
- Boolean setters (set_github_app_validated) are keyword-only
(``*, value: bool``). Ruff's FBT001/FBT002 rules apply across the
codebase and this matches the existing set_github_secrets pattern
on ContextDependency.
Next-iteration notes:
- The binding endpoints (PUT /orgs/{org}/dashboard-template etc.)
do not yet consult _require_github_app_config at the handler
layer — the design doc's 503 feature-disabled response is not
implemented for them today. Adding that gate is a separate slice;
this validator already feeds a False decision through the same
Factory checkpoint, so a future binding-endpoint gate would
inherit the validated-False behaviour with no further
context_dependency or worker plumbing.
- The worker startup test file (tests/worker/worker_startup_test.py)
drives the validator against a real WorkerFactoryBuilder rather
than running docverse.worker.main.startup end-to-end. The full
startup requires a live Redis (RedisArqQueue.initialize); a
Redis-backed worker startup test would belong in tests/integration
if it ever materialises.
Closes #278
Followup to the prior commit's startup-validator: the GitHubAppValidationState
protocol's three abstract members needed docstrings (D102), the bare-Exception
catch needed an explicit BLE001/TRY400 waiver (per the prior commit's reasoning
about not narrowing the catch list), and the test fixture's default
``webhook_secret`` needed an S107 waiver. Also drops drive-by SLF001 # noqa
comments ruff flagged as unused (the tests/** per-file-ignores already cover
SLF001).
Key decisions:
- Three # noqa waivers (BLE001, TRY400, PLR0913) on the validator carry the
reasoning inline. BLE001 is the load-bearing one — narrowing the catch to
``(InvalidKeyError, GitHubException, httpx.HTTPError)`` would miss any
future safir/pyjwt update that surfaces a fourth type, regressing the
acceptance criterion ("don't crash the service"). TRY400 (use logging.exception)
would attach a traceback that would clutter operator logs without adding
context — error_type and str(exc) already carry the diagnostically useful
fields. PLR0913 is on a startup-only function called from two sites; the
six kwargs are all distinct values that the call sites need to thread
through, so packing them into a dataclass would be ceremony without a
payoff.
- The S107 waiver on the ``webhook_secret`` fixture default scopes to the
default value, not to a real password. The fixture is in tests/main_test.py
where the standard tests/** S106 ignore already applies; S107 (function
default) is a separate code that needs its own waiver.
Next-iteration notes:
- None. This is purely a lint cleanup follow-up to the prior commit; no
behaviour change.
Closes #278
Add a nullable last_sync_queue_job_id FK on
dashboard_github_template_bindings, populate it inside
DashboardSyncEnqueuer.enqueue, and surface the queue-job URL on every
binding response (PUT/GET org-default and project-override) plus the
super-admin force-sync response. An operator who reads
last_sync_status="failed" can now click straight through to the
queue-job traceback without correlating the binding with a separate
job ID.
Key decisions:
- The FK uses ondelete=SET NULL so queue-job retention pruning never
breaks the binding row. The column is nullable permanently — pre-
existing rows have no job, a freshly-created binding has no job
until the enqueuer runs, and a pruned-job row legitimately reverts
to NULL. No backfill is needed and the migration is reversible.
- The binding store's read methods left-join queue_jobs and
materialize public_id in the same SELECT, so the response layer
builds last_sync_queue_job_url without a second round-trip. The
``Select[...]`` return type on the join helper is widened to
``Any`` because SQLAlchemy's typed-Select claims public_id is
``int`` (the column is non-nullable on the queue_jobs side); a
LEFT OUTER JOIN can yield ``int | None`` and the type stub cannot
express that distinction without overloads on outerjoin.
- ``set_last_sync_queue_job`` issues a Core UPDATE that explicitly
preserves date_updated by passing
``date_updated=SqlBinding.date_updated`` in the values dict. The
column's onupdate=now() server default fires on every ORM mutation,
but operator-visible date_updated tracks source-coordinate changes
(owner/repo/ref/root_path) — a sync-bookkeeping write should not
bump it. Without this, the existing PUT-idempotency invariant
(test_*_put_is_idempotent_no_op) regresses: the first PUT's
in-memory result.binding shows the pre-enqueue date_updated, while
the second PUT's re-fetch from DB shows the post-enqueue
date_updated, and the two responses no longer match.
- ``try_enqueue_dashboard_sync`` returns ``QueueJob | None`` instead
of ``None``: success surfaces the just-created job so the PUT
handler can attach its public_id to the response without a second
fetch. Failure returns None and the handler keeps the prior FK
visible (or None) — matching the existing
preserve-last-good pattern that github_template_id already uses.
Existing test_*_put_enqueue_failure_marks_binding_failed callers
still see status="failed".
- ``_attach_queue_job`` lives in the handler module rather than as a
``from_domain`` kwarg because the response model takes a domain
object and reads ``last_sync_queue_job_public_id`` off it; threading
a separate queue-job kwarg through ``from_domain`` would have made
every caller (admin GET-by-id-style sites included) decide whether
it has a queue job to attach. Keeping the override scoped to the
two PUT handlers that own the enqueue flow is narrower.
- The admin force-sync response gains queue_job_url constructed the
same way as DashboardRebuildResponse.from_queue_job — same
``request.url_for("get_queue_job", ...)`` route name. Operators can
click through without parsing queue_job_id by hand, matching the
org-binding endpoints' shape.
- The new migration n2o3p4q5r6s7 lands a separate FK index
(idx_dashboard_github_template_bindings_last_sync_queue_job_id) so
queue-job retention's ``DELETE FROM queue_jobs WHERE …`` can use
the FK index to find dependent binding rows during the SET NULL
cascade. PostgreSQL doesn't auto-index FK columns, and a binding
table with thousands of rows would otherwise force a sequential
scan on every queue-job delete.
Next-iteration notes:
- ``update_sync_state`` (called by the dashboard_sync worker) still
bumps date_updated on every status flip via the column's
onupdate=now() default. That is pre-existing behaviour — this
ticket is scoped to the enqueue-time bookkeeping write — but if a
future task wants strict source-only date_updated semantics, the
same pattern (``date_updated=SqlBinding.date_updated`` in the
Core UPDATE values dict) extends naturally.
- Sync-failure semantics deliberately preserve the prior FK (mirroring
github_template_id's preserve-last-good pattern). The acceptance
criterion "last_sync_queue_job_url remains null" is exact for newly-
created bindings; for an existing binding whose prior sync succeeded
and whose next enqueue then fails, the URL still points at the
prior queue_job. If a future task wants the FK cleared on failure
too, the failure handler in ``try_enqueue_dashboard_sync`` is the
single place to extend.
- ``last_sync_queue_job_public_id`` is materialized via a left join
on every read. If a future workload turns out to be read-heavy
enough that the join is a measurable cost, the column could
alternatively be denormalized onto the binding row (``last_sync_
queue_job_public_id`` stored directly). The current shape — FK +
join — is preferred because it keeps the public_id authoritative
on queue_jobs and lets the SET NULL cascade naturally invalidate
pruned URLs without a second write path.
Closes #277
The dashboard-template binding response previously exposed neither the
commit SHA of the most recent successful sync nor a clickable URL into
the GitHub browse UI. Add both fields to the response model. ``web_url``
is derived from the binding's source coordinates (always present once
a binding exists, regardless of sync state); ``commit_sha`` is
materialized via a new left join from
``dashboard_github_template_bindings`` to ``dashboard_github_templates``
on ``github_template_id`` and stays ``None`` until the first successful
sync links a content row.
Key decisions:
- ``build_github_browse_url`` lives in ``storage/github/web_url.py``
next to ``tree_fetcher.py`` so all GitHub-URL knowledge stays in one
place. The helper ``strip("/")`` normalises the root path: ``"/"``,
``""``, and ``"//"`` all collapse to a bare ``/tree/{ref}`` URL with
no trailing slash, while ``"/themes/blue"``, ``"themes/blue"``, and
``"/themes/blue/"`` all produce the same ``…/tree/{ref}/themes/blue``
output. Refs containing ``/`` (``release/1.0``) pass through
verbatim — GitHub's browse UI accepts embedded slashes in the
``/tree/...`` route without URL-encoding, and encoding the slash
would break the click-through.
- ``commit_sha`` is exposed via a left join rather than a denormalised
column on the binding row. Mirrors the existing
``last_sync_queue_job_public_id`` join pattern: the read methods
materialise the joined column in the same SELECT so the response
layer builds the field without a second round-trip, and the SET NULL
cascade on ``github_template_id`` naturally invalidates a stale SHA
when a content row is pruned. The single read-side join helper
``_select_with_joins`` replaces the prior ``_select_with_queue_job``
and adds the second outer join; ``_to_domain`` now takes both joined
columns and uses ``model_copy(update=…)`` only when at least one is
non-NULL so the existing
``test_get_by_id_returns_none_public_id_when_no_queue_job``-style
invariants still hold.
- The domain model gains ``commit_sha: str | None`` with ``default=
None`` so ``DashboardGitHubTemplateBinding.model_validate(row)`` on a
raw ``SqlDashboardGitHubTemplateBinding`` (the
``BindingStore.create`` path, which does not go through the join
helper) keeps working unchanged: the SQL row has no ``commit_sha``
attribute, but the field's default fills the gap.
- ``DashboardTemplateBindingResponse.from_domain`` reads
``domain.commit_sha`` (not the raw SQL row) and computes ``web_url``
by calling ``build_github_browse_url`` directly; both org-default
and project-override responses go through the same ``from_domain``
classmethod so neither caller needs special-casing. The admin-side
force-sync handler returns its own response shape and does not
serialise the binding, so it is unaffected.
Next-iteration notes:
- The follow-up rename / installation event processors in #240 / #242
do not touch the read-side join — they only mutate display strings
and reachability flags on existing rows. No further plumbing is
needed for this slice.
- ``commit_sha`` is exact for the synced commit, not for any later
ref movement. If a future task wants "last successful sync commit"
to remain pinned even after a binding is rebound to a different
ref, the commit_sha column on ``dashboard_github_templates`` is
already keyed by ``(owner, repo, ref, root_path)`` so the join
follows the binding's current coordinates, not its sync history.
This matches the design doc's overwrite-in-place semantics.
Closes #281
5 tasks
The dashboard-template binding endpoints were not discoverable from the
parent ``Organization`` or ``Project`` resource. Add a
``dashboard_template_url`` HATEOAS field on both response models so a
client browsing the org/project resource can follow the link straight to
the binding configuration. The field is populated from the existing
``get_org_dashboard_template`` / ``get_project_dashboard_template`` route
names — no new plumbing.
Key decisions:
- Field lands on the client base ``Organization`` and ``Project`` models
(``client/src/docverse/client/models/{organizations,projects}.py``)
rather than on the user-facing handler subclass only. The acceptance
criterion explicitly enumerates four test surfaces (org-list, org-get,
project-list, project-get) and the only org-list endpoint is the
super-admin ``GET /admin/orgs`` which uses the admin
``Organization`` subclass — also extending ``_OrganizationBase``.
Adding the field on the base means both subclasses inherit it
uniformly and the URL is always present (the route exists regardless
of whether the binding has been configured), so the contract matches
the issue's "the URL itself is stable" framing. Cost: the admin
``from_domain`` needs a one-line populate; benefit: a uniform
navigation link from any org response.
- The URL is computed via ``request.url_for(...)`` exactly like the
sibling ``services_url`` / ``credentials_url`` / ``editions_url``
fields. Because all three call-sites (user-facing org, admin org,
project) live in the same FastAPI app and the binding route names
(``get_org_dashboard_template``, ``get_project_dashboard_template``)
are already registered by ``handlers/orgs/dashboard_template.py``,
no router-include changes were needed.
- The client base ``Organization`` model uses
``ConfigDict(from_attributes=True)`` (no ``extra="forbid"``), so
adding a required field does not break clients that deserialize
server responses — they pick up the new field. ``OrganizationCreate``
/ ``OrganizationUpdate`` (which *do* use ``extra="forbid"``) are
request-only models and do not gain the field, so callers cannot
accidentally smuggle a HATEOAS URL into a write payload. Same
reasoning applies to ``Project`` vs. ``ProjectCreate`` /
``ProjectUpdate``.
- Existing handler tests for the four surfaces gained one assertion
each (``dashboard_template_url`` ends with the expected path and
starts with ``http``), keeping the new coverage co-located with
the existing per-endpoint URL assertions instead of fragmenting
into a separate test module.
Next-iteration notes:
- None. The field is plumbed through the three response sites that
return an ``Organization`` or ``Project`` model. Future
``OrganizationServiceSummary`` or similar embedded representations
do not need this field — the link belongs on the top-level resource.
Closes #282
4 tasks
Replace ``POST /admin/dashboard-templates/{binding_id}/sync`` with two
slug-keyed routes scoped under the org/project the binding belongs to:
``POST /orgs/{org}/dashboard-template/sync`` re-syncs the org-default
binding and ``POST /orgs/{org}/projects/{project}/dashboard-template/
sync`` re-syncs the project-override binding. Two motivations: the
public API must not expose database identifiers, and force-sync is a
routine operation that belongs with org admins, not super-admin only.
Key decisions:
- Both new handlers use the existing ``require_admin`` dependency.
``AuthorizationService.resolve_role()`` (services/authorization.py:31)
already grants super-admins ``OrgRole.admin`` automatically, so a
single ``require_admin`` covers "super-admin OR org-admin" without
bespoke wiring — no second handler variant or dual-dep is needed.
The slug-keyed URL already scopes the action to one org, so the
super-admin escalation flows through the same code path the binding
CRUD endpoints already use.
- The sync handlers do their own slug → binding resolution via
``DashboardTemplateBindingService.get_org_default`` /
``get_project_override`` (which raise ``NotFoundError`` → 404 when
the binding is unconfigured), then hand the resulting binding id to
``DashboardSyncEnqueuer``. The double load (service + enqueuer) is
intentional — keeping enqueue payloads keyed by binding id matches
the worker contract and avoids leaking slug strings into the queue
message — and the cost is one extra index hit per call.
- ``DashboardTemplateSyncEnqueuedResponse`` moved from
``handlers/admin/endpoints.py`` to ``handlers/orgs/models.py`` since
the model is no longer admin-scoped. Added a
``from_queue_job(binding_id, queue_job, request)`` classmethod that
mirrors the existing ``DashboardRebuildResponse.from_queue_job``
shape so both POST handlers stay one line. The ``binding_id``
field is preserved per the issue's "same response shape" acceptance
criterion — the slug context is implicit in the URL but the integer
id is still the authoritative pointer the worker logs against.
- The handlers attach to the same ``org_default_router`` /
``project_override_router`` split that #279 introduced, so the
org-default sync POST inherits ``tags=["orgs"]`` and the project-
override sync POST inherits ``tags=["projects"]`` with no per-route
override needed. Matches the convention #279 established for the
existing GET/PUT/DELETE endpoints on the same paths.
- The admin route + its tests are removed in the same change — the
PRD has not shipped, so a deprecation alias would be churn for
zero benefit. ``tests/handlers/admin_dashboard_template_test.py``
is deleted and the equivalent coverage lands inline in
``tests/handlers/dashboard_template_test.py`` next to the binding
CRUD tests it shares fixtures with.
- The OpenAPI tag test in ``test_dashboard_template_openapi_tags`` now
also asserts the legacy admin path is gone (no
``/docverse/admin/dashboard-templates/{binding_id}/sync`` entry in
``paths``). Catches a future regression where a migration alias
accidentally re-registers the old route.
Next-iteration notes:
- The response still carries the integer ``binding_id`` field. If a
later slice wants the public API fully purged of database ids, the
field can drop with no further plumbing — the URL identifies the
binding unambiguously and the queue-job public id is a separate
base32-encoded value already.
- ``try_enqueue_dashboard_sync`` (services/dashboard_templates/
enqueue.py:92) is the fire-and-forget variant the binding PUT
handlers use; the new sync handlers call ``enqueuer.enqueue``
directly because a force-sync's failure should bubble as a 500 and
surface in the response, not be silently logged like a PUT-time
enqueue. That asymmetry is deliberate and matches the prior admin
route's behaviour.
Closes #280
…dings The push webhook silently ignored real GitHub events because the binding lookup compared refs as exact strings. GitHub push payloads always carry ``ref = "refs/heads/<branch>"``, but the binding store kept whatever the operator typed — so a binding stored as ``main`` never matched the incoming ``refs/heads/main`` event. Normalize at both write and read seams: a Pydantic ``BeforeValidator``-style ``field_validator(mode= "before")`` on ``DashboardTemplateBindingCreate.github_ref`` strips a leading ``refs/heads/`` or ``refs/tags/`` prefix on input, and the push processor strips the same prefixes from ``payload["ref"]`` before the binding store lookup. Both sites call one shared ``normalize_github_ref`` helper. Key decisions: - The shared helper ``normalize_github_ref`` lives in ``client/src/docverse/client/models/dashboard_template.py`` (next to the model whose validator uses it), not in ``services/dashboard_templates/`` as the issue's approach sketch proposed. The client package is published independently of the server and its dependencies are restricted to ``pydantic``, ``httpx``, and ``click`` — importing from ``docverse.services.dashboard_templates`` into a client model would invert the package boundary and pull the entire server graph into client distributions. The server's ``push_processor`` already imports ``DashboardTemplateBindingCreate`` from ``docverse.client.models``, so importing the helper from the same module is a no-cost extension. Mirror of the ``normalize_base_domain`` pattern in ``client/.../models/organizations.py``. - ``mode="before"`` on the validator runs the prefix-strip *before* Pydantic applies the ``min_length=1`` / ``max_length=256`` constraints from the ``Field(...)`` annotation, so ``refs/heads/`` alone (a fully-qualified ref with an empty branch name) normalizes to ``""`` and then fails ``min_length=1`` with the standard pydantic error message. Same for ``refs/tags/``. Length bounds stay 1–256. - The push processor now logs ``"No bindings match push event"`` at ``info`` (was ``debug``) and includes both the raw and the normalized refs (``github_ref_raw`` + ``github_ref``). When a future operator hits a mismatch — different repo, mismatched branch, etc. — the ref pair shows up at the default verbosity instead of requiring a logging-level dial-up. The other two push-processor info lines stay on raw-ref logging because by the time they fire the lookup has already succeeded; the diagnostically interesting comparison is the no-match case. - Existing test fixtures that seeded bindings with ``github_ref="refs/heads/main"`` via the storage-level ``DashboardGitHubTemplateBindingCreate`` dataclass switch to the bare ``"main"`` form. Those storage-direct creates bypass the Pydantic validator, so they were testing a state that cannot exist after the validator: in production every PUT runs through the API, every API write normalizes, and stored refs are always canonical. Updating the fixtures to canonical form keeps them faithful to production data and avoids adding a third normalization site at the storage layer. - The two new ``test_process_matches_bare_*_against_refs_*_payload`` tests in ``push_processor_test.py`` are the asymmetric coverage the prior fixtures didn't have: every existing fixture used ``refs/heads/main`` on both sides, which is why the bug shipped without a failing test. A handler-level mirror in ``webhooks_github_test.py`` covers the same path through the HMAC-validated webhook seam, and a parametrized round-trip test in ``dashboard_template_test.py`` exercises both ``refs/heads/`` and ``refs/tags/`` from the org-admin PUT side, asserting that the response body shows the bare form and that a re-PUT with the bare form is idempotent. - The validator unit test in ``client/tests/`` covers the five cases the issue's acceptance criterion enumerates: bare branch, ``refs/heads/<branch>``, ``refs/tags/<tag>``, a 40-char hex SHA (must pass through untouched), and the empty string (must still fail ``min_length``). Plus two extras: a ref with an embedded slash (``release/1.0`` and ``refs/tags/release/1.0``) to lock in that only the leading prefix is stripped, and a non-tracked ref type (``refs/pull/123/head``) to confirm the strip list is bounded to the two prefixes operators care about. Next-iteration notes: - No data backfill is needed. This branch is unmerged and the only known production binding (jsc-test-20260409) was already in bare form; any future writes go through the new validator. If a prefixed row ever does sneak in via a pre-validator code path, it will start matching as soon as the webhook side normalizes the lookup ref, but the canonical-form invariant on the row itself would be off — a future task could add a one-shot migration that rewrites any stored ``refs/heads/...`` / ``refs/tags/...`` rows to bare form for tidiness. - The ``DashboardGitHubTemplateBindingCreate`` storage-level dataclass intentionally does *not* normalize. Its only callers are the binding service (which forwards a validated ``DashboardTemplateBindingCreate``) and tests that bypass the API. Adding a normalize at this layer would be belt-and- suspenders, but it would also let test fixtures lie about what production actually stores. If a future direct-storage code path ever lands (e.g. a CLI seeder), normalizing in ``binding_store.create`` is the right place to add it — calling the same shared helper. - The other two push-processor ``info`` log lines (``"Push event affected no binding root_paths"`` and ``"Enqueued dashboard_sync jobs from push event"``) still log ``github_ref=ref`` (the raw value). They fire after the lookup has already matched, so the raw value is enough; switching them to also include ``github_ref_raw`` would be log-line churn for no diagnostic gain. Closes #283
jonathansick
added a commit
that referenced
this pull request
May 1, 2026
Capture GitHub's stable repo / owner / installation IDs on first
successful sync, write them to bindings + content rows, and have the
push event processor prefer ID-keyed binding lookup with a name-based
fallback for un-synced rows. Internal matching now survives a GitHub
repo or org rename without operator intervention; the public API
remains name-keyed and exposes the captured IDs in the binding GET
response for debugging.
Key decisions:
- ``InstallationAuth`` grew a required ``installation_id: int`` field
rather than a separate ``get_installation_id`` round-trip from the
syncer. The app client already resolves the installation id inside
``get_installation_auth`` and was discarding it; threading it onto
the auth record is a no-cost change for production callers and lets
the syncer write ``github_installation_id`` to the binding without
doubling the GitHub round-trip count. The dataclass field is
required (no default) so production code cannot accidentally
construct an auth record without it; the test fixture's
``installation_auth`` helper sources it from the mock's seeded
``(owner, repo) -> id`` map (or ``default_installation_id``) so
existing direct-construction tests keep their ergonomic shape.
- The tree fetcher gained a ``_fetch_repo`` step that calls
``GET /repos/{owner}/{repo}`` rather than overloading the existing
``commits/{ref}`` call, because the commits response does not
include the repository's numeric ID and adding the field to its
schema is GitHub's prerogative. Repo + owner IDs land on
``FetchedTree`` as required ``int`` fields rather than
``int | None`` because the syncer is the only consumer in this
slice and it always wants both — leaving them ``Optional`` would
have leaked ``None`` into the store-layer guard for no benefit.
The ``raise_for_status`` chain catches a missing repo deterministi-
cally, so a real GitHub anomaly (e.g. installation lost mid-sync)
surfaces as a recorded sync failure rather than a silent no-id
capture.
- ``DashboardGitHubTemplateBindingStore`` exposes two new methods
rather than re-purposing ``list_by_repo_ref``:
``list_by_repo_id_and_ref`` for the ID-keyed primary lookup
(backed by ``idx_dashboard_github_template_bindings_repo_id_ref``
from #240) and ``list_unsynced_by_repo_ref`` whose
``github_repo_id IS NULL`` filter encodes the rename-safety
invariant directly. Without that filter, a binding whose name
coincidentally matches a *different* repo now sitting at the old
string would re-sync against the wrong upstream; pushing the
filter into the store query (vs. dedup-after-the-fact) keeps the
invariant load-bearing in code rather than only in tests.
``list_by_repo_ref`` stays in the API surface even though the
push processor no longer calls it, because the binding store is
a public storage primitive and external callers (admin tooling,
future debug consoles) may legitimately want the unfiltered
name-only view.
- ``PushEventProcessor._lookup_bindings`` unions the two store calls
with a ``seen: set[int]`` dedup guard. The two queries are
disjoint by construction (the unsynced filter excludes IDs the
primary lookup might find), so the dedup is defense-in-depth — but
the ``test_process_dedupes_id_and_name_matches`` test pins the
guard so a future loosening of either query (e.g. dropping the
``IS NULL`` filter "for symmetry") cannot regress to double-
enqueueing the same ``dashboard_sync`` job. ``_coerce_int`` rejects
``bool`` explicitly because ``isinstance(True, int)`` is ``True``
in Python and a malformed payload that sent ``"id": true`` would
otherwise leak ``1`` through as a real repo id.
- The store-layer "only assign if non-None" guard on
``update_sync_state`` and ``upsert`` (already pinned by tests in
#240) is what protects populated IDs from being overwritten on a
failed re-sync. The syncer now always passes populated IDs on the
success path, so the guard's job is to fence the *failure* path
(where ``_record_failure`` calls ``update_sync_state`` without ID
kwargs). The new ``test_sync_does_not_overwrite_populated_ids_
on_resync`` test pins the contract end-to-end so a future syncer
refactor that started passing ``None`` explicitly cannot zero a
binding's IDs.
- ``GitHubMock.seed_tree`` now auto-seeds the
``GET /repos/{owner}/{repo}`` endpoint with default ``(repo_id=1,
owner_id=1)`` so existing tree-fetcher tests that do not care
about ID capture keep working with a single ``seed_tree(...)``
call. Tests that round-trip captured IDs into the database pass
``repo_id=`` / ``owner_id=`` to ``seed_tree`` and the auto-call
uses those instead. ``seed_repo`` is exposed separately for tests
that bypass ``seed_tree`` (the syncer's GitHub-error fixture
needs the repo endpoint mocked but the ref endpoint to return
404, so it calls ``seed_repo`` then mocks ``commits/{ref}``
directly).
Next-iteration notes:
- Slice #242 ("rename + installation webhooks") will consume the
``github_owner_id`` / ``github_repo_id`` columns this slice now
populates. When a ``repository.renamed`` event arrives, the
handler can match the binding by stable ID (the rewrite target)
and update ``github_owner`` / ``github_repo`` strings without
re-syncing content. This slice intentionally does not register
any rename-event handlers — name-keyed pushes still match
un-synced bindings and ID-keyed pushes still match synced ones,
so the system is functional without rename handlers; they're a
staleness-fix, not a correctness requirement.
- The blocker merge of ``tickets/DM-54689-github-webhook`` (PR
#264) into this branch is a no-op once #264 lands on main first.
GitHub auto-merge will flatten the merge commit. If #264 is
rebased instead of merged cleanly, this branch needs a manual
re-merge.
- ``InstallationAuth.installation_id`` is now required, but the
GitHub App secrets remain optional on ``Factory`` /
``WorkerFactoryBuilder``. The optionality is a feature-flag for
the dashboard-template-from-github feature; once the feature
graduates from optional, the three GitHub App fields can be
tightened to non-Optional and ``startup`` raised earlier — the
``InstallationAuth`` change has no bearing on that decision.
- The ``DashboardTemplateBinding`` client model now exposes the
three numeric ID fields as informational. If a future client
wants to assert on rename-robustness from the outside, it can
observe ``github_repo_id`` flipping from ``null`` to populated
after the first successful sync — no new endpoint needed.
Closes #241
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
POST /docverse/webhooks/githubvalidates GitHub App HMAC signatures, dispatchespushevents throughgidgethub.routing.Router, and enqueues exactly onedashboard_syncjob per binding whose(owner, repo, ref)matches and whoseroot_pathoverlaps the changed-file set (compare-API fallback when GitHub truncates the in-payload commit list); unrelated event types respond 200 no-op so they don't redeliver.field_validator(mode="before")stripsrefs/heads//refs/tags/prefixes fromDashboardTemplateBindingCreate.github_refon input, and the push processor strips the same prefixes frompayload["ref"]before the binding lookup. Both sites call one sharednormalize_github_refhelper. Fixes a silent webhook-ignore where a binding stored asmainnever matched an incomingrefs/heads/mainevent.commit_sha(latest synced commit,nulluntil first success),web_url(clickable GitHub browse URL),last_sync_queue_job_url(back-pointer to the most recentdashboard_syncqueue job, populated atomically inside the enqueuer and surfaced by a left-join on read). HATEOASdashboard_template_urllands on the parentOrganizationandProjectresponses too./admin/toPOST /orgs/{org}/dashboard-template/syncandPOST /orgs/{org}/projects/{project}/dashboard-template/sync— same response shape, but org admins can call them without super-admin escalation, and the public API no longer leaks integer database ids in the URL.GET /apponce when all three GitHub App secrets are configured. On any failure (malformed key, 401 from GitHub, etc.) the validator logs a structured ERROR, flips a process-widegithub_app_validatedflag off, and the binding endpoints + webhook return their feature-disabled response for the lifetime of the process — operators see misconfiguration in startup logs instead of a worker traceback after the first dequeue. JWT-key parse errors and any other unanticipated syncer exception now also flip the binding tolast_sync_status="failed"with a descriptivelast_sync_error, so a misconfigured GitHub App can never leave a binding silently stuck atpending.Validation steps
POST /docverse/webhooks/githubresponds 404.pushevent → 200 and lands adashboard_syncjob on the queue when at least one binding'sroot_pathoverlaps the changed paths.root_pathis accepted (200) but enqueues nothing — verify by countingdashboard_syncjobs before/after.size > len(commits)) triggers the compare-API fallback and still correctly enqueues a sync when the compare-derived path set intersects a binding'sroot_path.ping) returns 200 with no enqueue."GitHub App config validated"is emitted withapp_id, and the binding endpoints + webhook continue to behave as enabled.app_idthat GitHub rejects with 401 onGET /app): the structured ERROR log"GitHub App config invalid; disabling feature"is emitted, the service still starts, andPOST /docverse/webhooks/githubthereafter responds 404.GET /appis issued).PUT /docverse/orgs/{org}/dashboard-templatereturns a body withlast_sync_queue_job_urlpointing at the enqueueddashboard_syncjob (URL ends with/queue/jobs/{base32_public_id}); a follow-upGETon the same endpoint returns the same URL until the next sync attempt overwrites it; a re-PUT that changes any source field flips the URL to the new job's id.DashboardSyncEnqueuer.enqueue), the PUT still succeeds (201/200), the response'slast_sync_queue_job_urlisnull, and the binding row'slast_sync_statusisfailedwith a descriptivelast_sync_error.PUT /docverse/orgs/{org}/dashboard-templatewithgithub_ref = "refs/heads/main"returns 201 and the response body showsgithub_ref = "main"; a follow-upGETreturns the same canonical form. The same holds forrefs/tags/v1.0↔v1.0. A re-PUT with the bare form is idempotent.pushwebhook withpayload.ref = "refs/heads/main"enqueues exactly onedashboard_syncfor a binding stored with the baregithub_ref = "main"(the asymmetric case the prior fixtures missed). Same forrefs/tags/<tag>↔<tag>.PUT /orgs/{org}/dashboard-templateandGET /orgs/{org}/dashboard-template(and the project-override variant) returncommit_sha(nullbefore first sync, set after),web_url(correct for bothroot_path == "/"— no path segment — and a non-root path — appended without double slashes), andlast_sync_queue_job_url.POST /docverse/orgs/{org}/dashboard-template/syncandPOST /docverse/orgs/{org}/projects/{project}/dashboard-template/syncrespond 202 withbinding_id,queue_job_id, andqueue_job_url(URL ends in the job's id). An org admin (non-super-admin) can call both successfully; a super-admin can call both successfully (existingresolve_role()escalation path); a non-admin caller receives 403; calling either route on an org/project with no binding configured returns 404. The legacyPOST /admin/dashboard-templates/{binding_id}/syncis gone (verify via the OpenAPI schema)./docverse/openapi.json(or/docverse/docs),GET/PUT/DELETE/POST(sync) /orgs/{org}/dashboard-templateappear under theorgstag;GET/PUT/DELETE/POST(sync) /orgs/{org}/projects/{project}/dashboard-templateappear underprojects.GET /orgs/{org}andGET /orgs/{org}/projects/{project}(plus the org-list and project-list variants) return absolutedashboard_template_urlvalues resolving to the binding endpoints.n2o3p4q5r6s7runs bothupgrade headanddowngrade -1cleanly against a populated test database — thelast_sync_queue_job_idcolumn, FK constraint, and FK index appear after upgrade and disappear after downgrade.References