Skip to content

DM-54689: Wire stable GitHub IDs through sync + push#274

Merged
jonathansick merged 1 commit intomainfrom
tickets/DM-54689-stable-github-ids
May 1, 2026
Merged

DM-54689: Wire stable GitHub IDs through sync + push#274
jonathansick merged 1 commit intomainfrom
tickets/DM-54689-stable-github-ids

Conversation

@jonathansick
Copy link
Copy Markdown
Member

Summary

  • Capture GitHub's stable repo / owner / installation IDs on first successful sync and write them onto the binding row + content row, so internal matching survives a GitHub repo or org rename without operator intervention.
  • Rework PushEventProcessor to look up bindings primarily by (github_repo_id, github_ref) (using the idx_dashboard_github_template_bindings_repo_id_ref index from DM-54689: Add nullable GitHub numeric ID columns to dashboard-template foundation #240) and union with a name-based fallback restricted to un-synced bindings (github_repo_id IS NULL); dedup by binding id before the root-path filter.
  • Surface the captured github_owner_id / github_repo_id / github_installation_id on the dashboard-template binding GET response for operator debugging — public API stays name-keyed.

Validation steps

  • Create an org-default binding via PUT /docverse/orgs/{org}/dashboard-template; confirm the returned body has github_owner_id, github_repo_id, and github_installation_id set to null, and last_sync_status="pending".
  • Trigger a sync (force-sync endpoint or initial enqueue from PUT) and confirm the binding GET response now shows the three numeric IDs populated, and the dashboard_github_templates content row carries github_owner_id / github_repo_id.
  • Re-sync the same binding (force-sync) and confirm none of the three numeric IDs revert to null.
  • Send a signed push webhook whose repository.id matches a synced binding but whose repository.name is different (rename simulation); confirm a dashboard_sync job is enqueued.
  • Send a signed push webhook for an un-synced binding (matching by (owner, repo, ref) strings only); confirm a dashboard_sync job is enqueued.
  • Send a signed push webhook whose (owner, repo, ref) matches a synced binding but whose repository.id differs; confirm no dashboard_sync job is enqueued (stale-name guard).

References

Capture GitHub's stable repo / owner / installation IDs on first
successful sync, write them to bindings + content rows, and have the
push event processor prefer ID-keyed binding lookup with a name-based
fallback for un-synced rows. Internal matching now survives a GitHub
repo or org rename without operator intervention; the public API
remains name-keyed and exposes the captured IDs in the binding GET
response for debugging.

Key decisions:
- ``InstallationAuth`` grew a required ``installation_id: int`` field
  rather than a separate ``get_installation_id`` round-trip from the
  syncer. The app client already resolves the installation id inside
  ``get_installation_auth`` and was discarding it; threading it onto
  the auth record is a no-cost change for production callers and lets
  the syncer write ``github_installation_id`` to the binding without
  doubling the GitHub round-trip count. The dataclass field is
  required (no default) so production code cannot accidentally
  construct an auth record without it; the test fixture's
  ``installation_auth`` helper sources it from the mock's seeded
  ``(owner, repo) -> id`` map (or ``default_installation_id``) so
  existing direct-construction tests keep their ergonomic shape.
- The tree fetcher gained a ``_fetch_repo`` step that calls
  ``GET /repos/{owner}/{repo}`` rather than overloading the existing
  ``commits/{ref}`` call, because the commits response does not
  include the repository's numeric ID and adding the field to its
  schema is GitHub's prerogative. Repo + owner IDs land on
  ``FetchedTree`` as required ``int`` fields rather than
  ``int | None`` because the syncer is the only consumer in this
  slice and it always wants both — leaving them ``Optional`` would
  have leaked ``None`` into the store-layer guard for no benefit.
  The ``raise_for_status`` chain catches a missing repo deterministi-
  cally, so a real GitHub anomaly (e.g. installation lost mid-sync)
  surfaces as a recorded sync failure rather than a silent no-id
  capture.
- ``DashboardGitHubTemplateBindingStore`` exposes two new methods
  rather than re-purposing ``list_by_repo_ref``:
  ``list_by_repo_id_and_ref`` for the ID-keyed primary lookup
  (backed by ``idx_dashboard_github_template_bindings_repo_id_ref``
  from #240) and ``list_unsynced_by_repo_ref`` whose
  ``github_repo_id IS NULL`` filter encodes the rename-safety
  invariant directly. Without that filter, a binding whose name
  coincidentally matches a *different* repo now sitting at the old
  string would re-sync against the wrong upstream; pushing the
  filter into the store query (vs. dedup-after-the-fact) keeps the
  invariant load-bearing in code rather than only in tests.
  ``list_by_repo_ref`` stays in the API surface even though the
  push processor no longer calls it, because the binding store is
  a public storage primitive and external callers (admin tooling,
  future debug consoles) may legitimately want the unfiltered
  name-only view.
- ``PushEventProcessor._lookup_bindings`` unions the two store calls
  with a ``seen: set[int]`` dedup guard. The two queries are
  disjoint by construction (the unsynced filter excludes IDs the
  primary lookup might find), so the dedup is defense-in-depth — but
  the ``test_process_dedupes_id_and_name_matches`` test pins the
  guard so a future loosening of either query (e.g. dropping the
  ``IS NULL`` filter "for symmetry") cannot regress to double-
  enqueueing the same ``dashboard_sync`` job. ``_coerce_int`` rejects
  ``bool`` explicitly because ``isinstance(True, int)`` is ``True``
  in Python and a malformed payload that sent ``"id": true`` would
  otherwise leak ``1`` through as a real repo id.
- The store-layer "only assign if non-None" guard on
  ``update_sync_state`` and ``upsert`` (already pinned by tests in
  #240) is what protects populated IDs from being overwritten on a
  failed re-sync. The syncer now always passes populated IDs on the
  success path, so the guard's job is to fence the *failure* path
  (where ``_record_failure`` calls ``update_sync_state`` without ID
  kwargs). The new ``test_sync_does_not_overwrite_populated_ids_
  on_resync`` test pins the contract end-to-end so a future syncer
  refactor that started passing ``None`` explicitly cannot zero a
  binding's IDs.
- ``GitHubMock.seed_tree`` now auto-seeds the
  ``GET /repos/{owner}/{repo}`` endpoint with default ``(repo_id=1,
  owner_id=1)`` so existing tree-fetcher tests that do not care
  about ID capture keep working with a single ``seed_tree(...)``
  call. Tests that round-trip captured IDs into the database pass
  ``repo_id=`` / ``owner_id=`` to ``seed_tree`` and the auto-call
  uses those instead. ``seed_repo`` is exposed separately for tests
  that bypass ``seed_tree`` (the syncer's GitHub-error fixture
  needs the repo endpoint mocked but the ref endpoint to return
  404, so it calls ``seed_repo`` then mocks ``commits/{ref}``
  directly).

Next-iteration notes:
- Slice #242 ("rename + installation webhooks") will consume the
  ``github_owner_id`` / ``github_repo_id`` columns this slice now
  populates. When a ``repository.renamed`` event arrives, the
  handler can match the binding by stable ID (the rewrite target)
  and update ``github_owner`` / ``github_repo`` strings without
  re-syncing content. This slice intentionally does not register
  any rename-event handlers — name-keyed pushes still match
  un-synced bindings and ID-keyed pushes still match synced ones,
  so the system is functional without rename handlers; they're a
  staleness-fix, not a correctness requirement.
- The blocker merge of ``tickets/DM-54689-github-webhook`` (PR
  #264) into this branch is a no-op once #264 lands on main first.
  GitHub auto-merge will flatten the merge commit. If #264 is
  rebased instead of merged cleanly, this branch needs a manual
  re-merge.
- ``InstallationAuth.installation_id`` is now required, but the
  GitHub App secrets remain optional on ``Factory`` /
  ``WorkerFactoryBuilder``. The optionality is a feature-flag for
  the dashboard-template-from-github feature; once the feature
  graduates from optional, the three GitHub App fields can be
  tightened to non-Optional and ``startup`` raised earlier — the
  ``InstallationAuth`` change has no bearing on that decision.
- The ``DashboardTemplateBinding`` client model now exposes the
  three numeric ID fields as informational. If a future client
  wants to assert on rename-robustness from the outside, it can
  observe ``github_repo_id`` flipping from ``null`` to populated
  after the first successful sync — no new endpoint needed.

Closes #241
@jonathansick jonathansick force-pushed the tickets/DM-54689-stable-github-ids branch from 7b5afe6 to d621bb7 Compare May 1, 2026 19:19
jonathansick added a commit to lsst-sqre/phalanx that referenced this pull request May 1, 2026
@jonathansick jonathansick merged commit 2a59aba into main May 1, 2026
22 checks passed
@jonathansick jonathansick deleted the tickets/DM-54689-stable-github-ids branch May 1, 2026 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DM-54689: Capture GitHub numeric IDs on sync + match by ID in push processor

1 participant