DM-54689: Wire stable GitHub IDs through sync + push#274
Merged
jonathansick merged 1 commit intomainfrom May 1, 2026
Merged
Conversation
6 tasks
Capture GitHub's stable repo / owner / installation IDs on first
successful sync, write them to bindings + content rows, and have the
push event processor prefer ID-keyed binding lookup with a name-based
fallback for un-synced rows. Internal matching now survives a GitHub
repo or org rename without operator intervention; the public API
remains name-keyed and exposes the captured IDs in the binding GET
response for debugging.
Key decisions:
- ``InstallationAuth`` grew a required ``installation_id: int`` field
rather than a separate ``get_installation_id`` round-trip from the
syncer. The app client already resolves the installation id inside
``get_installation_auth`` and was discarding it; threading it onto
the auth record is a no-cost change for production callers and lets
the syncer write ``github_installation_id`` to the binding without
doubling the GitHub round-trip count. The dataclass field is
required (no default) so production code cannot accidentally
construct an auth record without it; the test fixture's
``installation_auth`` helper sources it from the mock's seeded
``(owner, repo) -> id`` map (or ``default_installation_id``) so
existing direct-construction tests keep their ergonomic shape.
- The tree fetcher gained a ``_fetch_repo`` step that calls
``GET /repos/{owner}/{repo}`` rather than overloading the existing
``commits/{ref}`` call, because the commits response does not
include the repository's numeric ID and adding the field to its
schema is GitHub's prerogative. Repo + owner IDs land on
``FetchedTree`` as required ``int`` fields rather than
``int | None`` because the syncer is the only consumer in this
slice and it always wants both — leaving them ``Optional`` would
have leaked ``None`` into the store-layer guard for no benefit.
The ``raise_for_status`` chain catches a missing repo deterministi-
cally, so a real GitHub anomaly (e.g. installation lost mid-sync)
surfaces as a recorded sync failure rather than a silent no-id
capture.
- ``DashboardGitHubTemplateBindingStore`` exposes two new methods
rather than re-purposing ``list_by_repo_ref``:
``list_by_repo_id_and_ref`` for the ID-keyed primary lookup
(backed by ``idx_dashboard_github_template_bindings_repo_id_ref``
from #240) and ``list_unsynced_by_repo_ref`` whose
``github_repo_id IS NULL`` filter encodes the rename-safety
invariant directly. Without that filter, a binding whose name
coincidentally matches a *different* repo now sitting at the old
string would re-sync against the wrong upstream; pushing the
filter into the store query (vs. dedup-after-the-fact) keeps the
invariant load-bearing in code rather than only in tests.
``list_by_repo_ref`` stays in the API surface even though the
push processor no longer calls it, because the binding store is
a public storage primitive and external callers (admin tooling,
future debug consoles) may legitimately want the unfiltered
name-only view.
- ``PushEventProcessor._lookup_bindings`` unions the two store calls
with a ``seen: set[int]`` dedup guard. The two queries are
disjoint by construction (the unsynced filter excludes IDs the
primary lookup might find), so the dedup is defense-in-depth — but
the ``test_process_dedupes_id_and_name_matches`` test pins the
guard so a future loosening of either query (e.g. dropping the
``IS NULL`` filter "for symmetry") cannot regress to double-
enqueueing the same ``dashboard_sync`` job. ``_coerce_int`` rejects
``bool`` explicitly because ``isinstance(True, int)`` is ``True``
in Python and a malformed payload that sent ``"id": true`` would
otherwise leak ``1`` through as a real repo id.
- The store-layer "only assign if non-None" guard on
``update_sync_state`` and ``upsert`` (already pinned by tests in
#240) is what protects populated IDs from being overwritten on a
failed re-sync. The syncer now always passes populated IDs on the
success path, so the guard's job is to fence the *failure* path
(where ``_record_failure`` calls ``update_sync_state`` without ID
kwargs). The new ``test_sync_does_not_overwrite_populated_ids_
on_resync`` test pins the contract end-to-end so a future syncer
refactor that started passing ``None`` explicitly cannot zero a
binding's IDs.
- ``GitHubMock.seed_tree`` now auto-seeds the
``GET /repos/{owner}/{repo}`` endpoint with default ``(repo_id=1,
owner_id=1)`` so existing tree-fetcher tests that do not care
about ID capture keep working with a single ``seed_tree(...)``
call. Tests that round-trip captured IDs into the database pass
``repo_id=`` / ``owner_id=`` to ``seed_tree`` and the auto-call
uses those instead. ``seed_repo`` is exposed separately for tests
that bypass ``seed_tree`` (the syncer's GitHub-error fixture
needs the repo endpoint mocked but the ref endpoint to return
404, so it calls ``seed_repo`` then mocks ``commits/{ref}``
directly).
Next-iteration notes:
- Slice #242 ("rename + installation webhooks") will consume the
``github_owner_id`` / ``github_repo_id`` columns this slice now
populates. When a ``repository.renamed`` event arrives, the
handler can match the binding by stable ID (the rewrite target)
and update ``github_owner`` / ``github_repo`` strings without
re-syncing content. This slice intentionally does not register
any rename-event handlers — name-keyed pushes still match
un-synced bindings and ID-keyed pushes still match synced ones,
so the system is functional without rename handlers; they're a
staleness-fix, not a correctness requirement.
- The blocker merge of ``tickets/DM-54689-github-webhook`` (PR
#264) into this branch is a no-op once #264 lands on main first.
GitHub auto-merge will flatten the merge commit. If #264 is
rebased instead of merged cleanly, this branch needs a manual
re-merge.
- ``InstallationAuth.installation_id`` is now required, but the
GitHub App secrets remain optional on ``Factory`` /
``WorkerFactoryBuilder``. The optionality is a feature-flag for
the dashboard-template-from-github feature; once the feature
graduates from optional, the three GitHub App fields can be
tightened to non-Optional and ``startup`` raised earlier — the
``InstallationAuth`` change has no bearing on that decision.
- The ``DashboardTemplateBinding`` client model now exposes the
three numeric ID fields as informational. If a future client
wants to assert on rename-robustness from the outside, it can
observe ``github_repo_id`` flipping from ``null`` to populated
after the first successful sync — no new endpoint needed.
Closes #241
7b5afe6 to
d621bb7
Compare
jonathansick
added a commit
to lsst-sqre/phalanx
that referenced
this pull request
May 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PushEventProcessorto look up bindings primarily by(github_repo_id, github_ref)(using theidx_dashboard_github_template_bindings_repo_id_refindex from DM-54689: Add nullable GitHub numeric ID columns to dashboard-template foundation #240) and union with a name-based fallback restricted to un-synced bindings (github_repo_id IS NULL); dedup by binding id before the root-path filter.github_owner_id/github_repo_id/github_installation_idon the dashboard-template bindingGETresponse for operator debugging — public API stays name-keyed.Validation steps
PUT /docverse/orgs/{org}/dashboard-template; confirm the returned body hasgithub_owner_id,github_repo_id, andgithub_installation_idset tonull, andlast_sync_status="pending".dashboard_github_templatescontent row carriesgithub_owner_id/github_repo_id.null.repository.idmatches a synced binding but whoserepository.nameis different (rename simulation); confirm adashboard_syncjob is enqueued.(owner, repo, ref)strings only); confirm adashboard_syncjob is enqueued.(owner, repo, ref)matches a synced binding but whoserepository.iddiffers; confirm nodashboard_syncjob is enqueued (stale-name guard).References