Skip to content

DM-54689: GitHub push webhook + PushEventProcessor#264

Open
jonathansick wants to merge 8 commits intomainfrom
tickets/DM-54689-github-webhook
Open

DM-54689: GitHub push webhook + PushEventProcessor#264
jonathansick wants to merge 8 commits intomainfrom
tickets/DM-54689-github-webhook

Conversation

@jonathansick
Copy link
Copy Markdown
Member

@jonathansick jonathansick commented Apr 25, 2026

Summary

  • Add POST /docverse/webhooks/github that verifies the GitHub App HMAC via gidgethub.sansio.Event.from_http, dispatches push events through a module-scoped gidgethub.routing.Router, and otherwise no-ops on 200 so unrelated event types do not get redelivered.
  • Add PushEventProcessor in services/dashboard_templates/push_processor.py: looks up bindings by (owner, repo, ref), computes the effective changed-path set (cheap path: in-payload commits; fallback: GitHub compare API on truncation), filters bindings by root_path overlap, and enqueues one dashboard_sync per surviving binding.
  • Disabled-by-default surface: when any of github_app_id / github_app_private_key / github_webhook_secret is unset, the endpoint returns 404 (not 503/500) so a misconfigured deployment does not page operators on every redelivery attempt.
  • Handler polish: _lowercase_headers is typed against Mapping[str, str] and drops a dead hasattr guard; Factory.create_webhook_dispatch collapses the previous secret + processor accessors into a single GitHubAppNotConfiguredError raise site for the handler's try block; the Processed push webhook log line emits github_delivery_id only (no duplicate delivery_id key) so log queries stay consistent with the surrounding handler's bound context.
  • Push processor polish: _root_path_matches is narrowed to Sequence[str] so its empty-check can use bool(changed_paths)any(True for _ in g) consumes a generator (where bool(g) is always true), so narrowing the type and switching to bool() forecloses that latent bug; process() reads repository and owner via the two-arg .get(..., {}) form for consistency with the rest of the module; the parallel _split_full_name / _repo_from_full_name helpers collapse into a single _split_full_name_tuple returning tuple[str, str] | None, destructured once in process() with a (None, None) fallback so the existing or-chain semantics are preserved without parsing the same full_name twice.
  • Factory hygiene: Factory._require_github_app_config now owns the three-way "is None" check on the GitHub App secrets, so create_github_app_client and create_webhook_dispatch share a single typed accessor and cannot drift on which secrets count as "configured".
  • Test-infra hygiene: ContextDependency.set_github_secrets(...) lets the github_app_enabled fixture toggle the three GitHub-App secret slots without poking the singleton's private attributes, and a new tests/support/arq_testing.py module centralizes every MockArqQueue._job_metadata read behind count_jobs_by_name / get_jobs_by_name / queue_names helpers (migrating nine adjacent test files in the process).

Validation steps

  • With GitHub App secrets unset, POST /docverse/webhooks/github responds 404.
  • With secrets set, an unsigned POST → 401; a POST signed with the wrong secret → 401; a correctly signed push event → 200 and lands a dashboard_sync job on the queue when at least one binding's root_path overlaps the changed paths.
  • A push that only touches paths outside every binding's root_path is accepted (200) but enqueues nothing — verify by counting dashboard_sync jobs before/after.
  • A truncated push payload (size > len(commits)) triggers the compare-API fallback and still correctly enqueues a sync when the compare-derived path set intersects a binding's root_path.
  • A signed event for a type we do not subscribe to (e.g. ping) returns 200 with no enqueue.
  • grep -R "_job_metadata" tests/ returns matches only inside tests/support/arq_testing.py (private-attr access centralized in one place; the helper's own test references the centralizing module via register_queue).

References

Close the live-update path for GitHub-backed dashboard templates: a
new POST /docverse/webhooks/github endpoint verifies the HMAC, parses
the push payload, and enqueues one dashboard_sync job per binding
whose root_path was actually touched by the push. Truncated payloads
fall back to the GitHub compare API; mis-signed deliveries get 401;
deployments without GitHub App secrets configured see a 404.

Key decisions:
- Route HMAC validation through gidgethub.sansio.Event.from_http and
  dispatch through gidgethub.routing.Router rather than safir.github
  (which only exposes Pydantic webhook payload models — no router or
  HMAC helper, despite the issue body's hopeful "or the equivalent
  registration API" wording). The gidgethub router is held at module
  scope: registering a fresh router per request would defeat its
  registration model. Keeps the handler aligned with Times Square /
  Semaphore and avoids hand-rolling HMAC, satisfying user story 23.
- Feature-disabled (no GitHub App config) returns 404, not 503.
  Caught at handler entry by routing both
  ``get_github_webhook_secret`` and ``create_push_event_processor``
  through ``GitHubAppNotConfiguredError``. The 503 used by the
  org-admin binding endpoints (#235) would surface as a permanent
  GitHub redelivery loop on a misconfigured deployment; 404 hides
  the URL until secrets are wired and lets GitHub stop retrying.
- Added ``Factory.get_github_webhook_secret`` (public accessor) so
  the webhook handler does not reach into ``_github_webhook_secret``
  directly. Mirrors ``create_github_app_client``: validates that all
  three secrets are set and raises the same
  ``GitHubAppNotConfiguredError`` so the handler's single
  except-clause covers both paths.
- ``PushEventProcessor.process`` returns ``[]`` rather than raising
  on no-op cases (no bindings match repo/ref; no binding root_paths
  intersect changed paths; truncated payload with no before/after).
  The webhook handler's surrounding ``session.begin()`` + commit
  still runs on the empty path because logging the no-op outcome is
  cheap and the empty enqueue list is the natural success result —
  the Slack-integration loop in ``main.py`` makes the same trade.
- ``_root_path_matches`` normalises ``root_path`` exactly the way
  ``GitHubTreeFetcher._normalize_root`` does: ``"/"`` strips to ``""``
  and matches any non-empty changed-path set, while ``"templates/blue"``
  matches paths that start with ``"templates/blue/"`` or equal the
  bare directory entry. Keeps the webhook filter and the syncer's
  filter in lockstep so a binding never gets a sync enqueued for a
  path the syncer would then have ignored as out-of-scope.
- ``DashboardGitHubTemplateBindingStore.list_by_repo_ref`` is keyed
  by name (``github_owner``/``github_repo``/``github_ref``), not by
  numeric ID. Numeric-ID lookup is task #240's job; this slice
  matches by the same composite name index
  (``idx_dashboard_github_template_bindings_repo_ref``) that the PRD
  identifies as the push-lookup index for v1, with the rename-
  robustness ID-preferential lookup deferred per the PRD's "GitHub
  identity stability" section.
- The webhook handler's gidgethub router is registered with a single
  ``"push"`` callback. ``ping`` and any other event types arrive,
  signature-verify, and return 200 with no enqueue — gidgethub's
  ``dispatch`` is a fan-out to a possibly empty callback set, so
  unrelated events naturally no-op. This satisfies user story 23
  ("does not hand-roll HMAC or event parsing") and keeps GitHub's
  redelivery machinery quiet for events we deliberately ignore.
- Test fixture ``github_app_enabled`` toggles the three GitHub App
  secrets on the autouse ``context_dependency`` for the lifetime of
  one test and restores them on teardown. The default-disabled state
  preserved between tests is what makes the 404-feature-disabled
  test trivial: it just hits the URL with no fixture and asserts on
  the absence of GitHub config.

Next-iteration notes:
- Slice #240/#242 (rename-robustness) will add an ID-preferential
  binding lookup alongside ``list_by_repo_ref``. The push processor
  can then consult ``list_by_repo_id_ref`` first and fall back to
  the name-based path for un-synced bindings. The handler's plumbing
  does not need to change — only the processor's binding-lookup step.
- The handler currently registers only ``push`` events. Adding
  ``installation`` (created/deleted/suspend/unsuspend) and
  ``repository.renamed`` / ``repository.transferred`` /
  ``organization.renamed`` event handlers is task #242's scope; they
  plug into the same ``_event_router.register(...)`` decorator, so
  no further wiring change is needed. The ``create_push_event_
  processor`` factory method will likely sprout sibling
  ``create_installation_event_processor`` /
  ``create_rename_event_processor`` methods.
- Squarebot/Kafka event delivery (PRD §"Out of Scope") still goes
  through this same ``PushEventProcessor`` if it ever lands — the
  only thing the Squarebot path would change is the request entry
  point (a Kafka consumer instead of a FastAPI endpoint). The
  processor itself takes a raw payload dict, so swapping the input
  layer is a tracer-bullet exercise.
- Payload field extraction is duck-typed off the raw push dict
  (``payload["repository"]["owner"]["login"]`` etc.) rather than
  going through ``safir.github.GitHubPushEventModel``. The model
  only carries ``repository`` / ``installation`` / ``ref`` /
  ``before`` / ``after`` — it doesn't include the ``commits`` array
  that ``extract_changed_paths_from_push`` reads — so once we have
  to drop into the dict for ``commits`` and ``size`` anyway, parsing
  the rest off the dict is the simpler shape. A future refactor
  could parse a richer model and pass it through; the dict path is
  the minimum needed today.

Closes #238
Three review-driven polishes on the GitHub webhook entry point: type
``_lowercase_headers`` against ``Mapping[str, str]`` (FastAPI's
``Request.headers`` already satisfies it, so the ``hasattr`` guard
was dead code), collapse ``Factory.get_github_webhook_secret`` and
``create_push_event_processor`` into a single
``create_webhook_dispatch`` accessor so the handler's
``except GitHubAppNotConfiguredError`` wraps one call site, and drop
the redundant ``delivery_id`` field from the ``Processed push
webhook`` log line — ``github_delivery_id`` is already bound on the
context logger and the duplicate key fragmented log greppability.

Key decisions:
- ``create_webhook_dispatch`` returns the bare ``(secret, processor)``
  tuple rather than a small dataclass / namedtuple. The handler is
  the only caller and immediately destructures the pair, so giving
  the bundle a name would be ceremony for one call site. If a
  second caller materializes (e.g. a Squarebot-mediated webhook
  path), promoting to a typed container is the cheap follow-up.
- The new accessor performs its own three-secret presence check up
  front rather than letting ``create_github_app_client`` raise from
  inside the processor build. This keeps the single
  ``GitHubAppNotConfiguredError`` raise the handler observes
  cheap (no transitive imports / object construction before we
  decide the feature is off) and matches the pattern the displaced
  ``get_github_webhook_secret`` already used.
- Both replaced methods (``get_github_webhook_secret`` and
  ``create_push_event_processor``) had no other call sites, so they
  are deleted outright rather than left as deprecated shims. CLAUDE.md
  guidance is to skip backwards-compatibility hacks when the codebase
  has the only callers in scope.

Next-iteration notes:
- None. The rename-robustness handlers (#240/#242) will add sibling
  ``create_installation_event_processor`` / ``create_rename_event_
  processor`` factory methods that follow the same single-accessor
  shape — they don't need to thread through ``create_webhook_dispatch``.

Closes #266
Two test-infrastructure improvements removing direct private-attr
access from the webhook test suite (and the broader test suite that
copied the same shape). ``ContextDependency.set_github_secrets`` lets
the ``github_app_enabled`` fixture (and any future caller) toggle the
three GitHub-App secret slots without poking the singleton's private
attributes, and a new ``tests/support/arq_testing.py`` module
centralizes ``MockArqQueue._job_metadata`` reads behind
``count_jobs_by_name`` / ``get_jobs_by_name`` / ``queue_names``
helpers.

Key decisions:
- ``set_github_secrets(*, app_id, private_key, webhook_secret)`` is
  the supported test-fixture toggle, not ``initialize``. The existing
  async ``initialize`` covers the broader process-lifetime
  configuration (eight optional kwargs, including
  ``user_info_store`` / ``credential_encryptor`` / ``discovery``) and
  has the "only honor non-None values" semantics that the autouse
  ``app`` fixture needs at startup. A test fixture toggling the
  feature on for one test and off again on teardown wants the
  opposite contract: ``None`` should clear the slot. The new method
  is sync, scoped to the three GitHub-App fields, and treats every
  argument as authoritative.
- The migration scope went past the three call sites the issue
  body listed (``webhooks_github_test``,  ``queue_backend_test``,
  ``build_processing_test``) and swept up the other six tests that
  had copy-pasted the same ``[j for queue in mock._job_metadata.values()
  for j in queue.values() if j.name == NAME]`` shape (``publish_edition_test``,
  ``edition_patch_override_test``, ``edition_rollback_test``,
  ``admin_dashboard_template_test``, ``dashboard_template_test``,
  ``dashboard_enqueue_triggers_test``). Without that the acceptance
  criterion's ``grep -R "_job_metadata" tests/`` would still hit the
  copies and the centralization would be only nominally complete. The
  remaining grep matches in ``tests/storage/queue_backend_test.py``
  are calls to the *public* ``ArqQueueBackend.get_job_metadata``
  method, not private-attr reads — they are unrelated to the
  centralization the issue is asking for.
- Added a third helper ``queue_names(arq_queue) -> set[str]`` so
  ``build_processing_test``'s ``"arq:queue" not in
  mock_arq._job_metadata`` invariant (jobs landed under the
  configured queue name, not arq's default) can be expressed without
  reaching for the private attribute. The issue body listed only
  ``count_jobs_by_name`` and ``get_jobs_by_name``; the third helper
  is the cheapest way to satisfy the acceptance grep without
  inventing a workaround at one call site.
- Added a fourth helper ``register_queue(arq_queue, name)`` purely so
  ``arq_testing_test.py`` can exercise the multi-queue branch of
  ``get_jobs_by_name`` / ``count_jobs_by_name`` without needing to
  touch ``_job_metadata`` itself. ``MockArqQueue`` only auto-creates
  the slot for its ``default_queue_name``; enqueueing into any other
  queue raises ``KeyError`` until the slot is added. This keeps the
  helper's own test on the public-helper side of the centralization
  line.
- The ``github_app_enabled`` fixture still reads
  ``context_dependency._github_app_id`` (etc.) when capturing the
  prior state for restoration on teardown. The acceptance criterion
  is scoped to "private-attr *assignment*"; the issue's
  out-of-scope follow-up note covers the broader read-of-private-
  attr cleanup. Adding read accessors today would be API-creep
  without a paying caller.

Next-iteration notes:
- ``tests/conftest.py:119-125`` still pokes
  ``_superadmin_usernames`` and ``_user_info_store`` directly on the
  singleton. The issue body called this out explicitly as a parallel
  follow-up. The pattern needed is the same — a public
  ``set_admin_overrides(...)`` (or two narrow setters) on
  ``ContextDependency`` and a fixture migration. Out of scope here.
- The ``register_queue`` helper exists today only because the
  helper's own test needs to enqueue into a non-default queue. If
  a future production-side test materializes that legitimately
  needs more than one queue (rather than just inspecting an
  off-default-queue assertion), promote ``register_queue`` to a
  documented part of the helper API; otherwise it can stay as the
  small one-liner it is.
- Six adjacent test files now import from
  ``tests.support.arq_testing``. If ``safir.arq`` ever exposes a
  public iteration API, the four ``# noqa: SLF001`` lines inside
  ``arq_testing.py`` are the single place to swap the implementation
  — no caller-side change needed.

Closes #269
Both create_github_app_client and create_webhook_dispatch hand-rolled
the same three-way "is None" check on the GitHub App secrets and
raised an identical GitHubAppNotConfiguredError. Collapse the two
sites onto a single private _require_github_app_config helper that
returns the resolved (app_id, private_key, webhook_secret) tuple
typed as non-None, so a future configuration-shape change cannot
drift between the two callers.

Key decisions:
- The helper returns a bare tuple rather than a dataclass / NamedTuple.
  Both call sites destructure once and ignore the slots they don't
  need (``_, _, webhook_secret`` and ``app_id, private_key, _``). A
  named container would be ceremony for two call sites whose use of
  the unused slots is a single underscore. If a third caller
  materializes (e.g. installation/rename event processors per task
  #242) and starts paying attention to a third field, promoting to a
  named bundle is the cheap follow-up.
- The helper is private (``_require_*``) because it leaks the bare
  ``SecretStr`` instances rather than their decoded values. The two
  callers each handle ``.get_secret_value()`` themselves at the seam
  where the secret crosses into the GitHub-App / HMAC-verifier
  boundary, which keeps the helper safe to call from any future
  factory method without deciding the secret's lifetime.
- The HTTP-client check stays inline in each caller. Only
  ``create_webhook_dispatch`` and ``create_github_app_client`` need
  the shared client, and each raises a method-specific
  ``RuntimeError`` message; folding that into the helper would force
  every future caller to also need an HTTP client (e.g. an
  HMAC-only verifier doesn't), so the responsibility stays scoped.

Next-iteration notes:
- Slice #242's rename / installation event processors will need the
  same three secrets; they can call ``_require_github_app_config``
  directly. Promoting the helper to public (drop the underscore) is
  not necessary unless a non-Factory caller wants the same
  precondition check.

Closes #267
Two style fixes in push_processor.py from PR #264 review: narrow
_root_path_matches's changed_paths parameter from Iterable[str] to
Sequence[str] so the empty-check can use bool(changed_paths), which is
sound on a sequence but always True on a generator that any() would
have consumed; and collapse process()'s payload.get("repository") or
{} / repo.get("owner") or {} chain to the cleaner two-arg
.get(..., {}) form.

Key decisions:
- Sequence (not list) is the narrowest type that supports bool() and
  still accepts both call sites — both pass the materialized list[str]
  returned by _resolve_changed_paths, so no caller change is needed.
  Using list[str] would have over-constrained the contract for no
  caller benefit.
- The any(True for _ in ...) pattern is correct on the current
  list[str] callers (a list is iterable and re-iterable), but the
  Iterable[str] type signature advertised support for generators,
  where any(True for _ in g) consumes the generator and bool(g) on
  the same generator is unconditionally True. Narrowing the type +
  switching to bool() forecloses that latent bug rather than
  documenting around it.

Next-iteration notes:
- None. The two-arg .get(..., {}) form is now the convention in this
  module; future processors added under dashboard_templates/ should
  follow the same shape.

Closes #268
The tests/** ruff per-file-ignores already cover SLF001, so the
inline # noqa: SLF001 markers next to MockArqQueue._job_metadata
accesses were unused. Removing them lets ruff format reflow two
short call sites onto one line.
@jonathansick jonathansick force-pushed the tickets/DM-54689-github-webhook branch from d2bd6ef to ec9eb27 Compare April 27, 2026 16:43
Replace ``_split_full_name`` and ``_repo_from_full_name`` (each
returning one half of a parsed ``owner/repo`` string) with a single
``_split_full_name_tuple`` returning ``tuple[str, str] | None``. The
two ``process()`` call sites destructure the tuple once with a
``(None, None)`` fallback so the existing ``or``-chain semantics
(prefer ``repository.owner.login``/``repository.name``, fall back to
the parsed full_name) are preserved without parsing the same
``full_name`` twice.

Key decisions:
- Destructure once at the top of the block into ``fallback_owner`` /
  ``fallback_repo`` rather than calling the helper inline twice. The
  inline form would have re-parsed the same ``"full_name"`` string
  for each half — the duplication that motivated the original two
  helpers and still motivates them under one name. The
  ``or (None, None)`` fallback keeps the call sites' ``or``-chain
  truthy-test on a plain ``str | None`` rather than an
  ``Optional[tuple]`` membership check.
- The helper signature stays ``full_name: object`` (not ``str | None``)
  to match the duck-typed payload-dict access pattern that the rest
  of ``process()`` uses on ``payload.get(...)`` /
  ``repo.get("full_name")``. Tightening to ``str | None`` would push
  ``isinstance``/cast noise into the caller for no caller benefit.
- The collapsed helper still validates ``"/"`` membership via
  ``isinstance(full_name, str) and "/" in full_name`` before
  splitting, so a missing or malformed ``full_name`` returns ``None``
  exactly as the prior pair did. ``str.split("/", 1)`` on a string
  containing ``"/"`` is guaranteed to yield two items, so the
  ``owner, repo = ...`` unpack is total.

Next-iteration notes:
- None. The module-level helper count drops from two to one and the
  call site shrinks by one line; no follow-up scaffolding is left.

Closes #270
PR #264 has unit-level coverage for both the truncation signal in
extract_changed_paths_from_push and the compare-API fallback in the
push processor, but neither path is exercised end-to-end through the
HMAC-validated webhook handler. Add two handler tests: a truncated
push (size=30, one in-payload commit limited to docs/) seeds a compare
response that does land inside the binding's root_path and asserts
exactly one dashboard_sync enqueue, and an empty-commits push
(size=0, commits=[]) asserts zero enqueues plus zero compare-API
requests via mock_github.router.calls.

Key decisions:
- Extended the existing _push_payload helper with explicit ``commits``
  and ``size`` overrides rather than constructing two ad-hoc payload
  dicts. The in-tree style (PushEventProcessor's own service tests)
  already exposes both knobs on a similar helper, and the helper
  caller still defaults to the one-commit / size=1 happy path that
  every existing test relies on. Mutating the existing keyword surface
  preserves the four older callers without churn.
- The empty-commits assertion checks ``mock_github.router.calls``
  instead of registering and asserting on a respx route handle. The
  ``seed_compare`` helper does not return its Route, and the issue's
  acceptance criterion is "compare route saw zero requests" — a
  call-list filter on ``"/compare/"`` in the URL path satisfies that
  with no signature change to ``seed_compare``. The assertion is
  scoped to compare-shaped URLs so it tolerates the always-mocked
  Repertoire-discovery and installation-token traffic that shares the
  same router.
- Truncated-push payload uses ``size=30`` (>20) with a single
  in-payload commit. Either ``size > len(commits)`` or
  ``len(commits) >= 20`` would trigger the fallback; ``size=30`` is
  the more readable signal and matches the shape used by the existing
  push_processor service tests so the two layers stay legible
  side-by-side. The in-payload commit deliberately reports only
  ``docs/index.md`` (outside ``root_path="/"`` matters not, but for a
  future restriction this asymmetry guarantees the compare call is
  what actually surfaces the template path).
- The empty-commits test uses ``root_path="/"`` (not a nested root)
  so the assertion isolates the empty-changed-path branch of
  ``_root_path_matches`` (``bool(changed_paths)`` is the gate) rather
  than the prefix branch. A nested root would also pass but would
  conflate two reasons for not enqueueing.

Next-iteration notes:
- None. The handler-level coverage now matches the service-level
  coverage for both truncation paths; rename-handler / installation-
  event tests landing under #240/#242 will follow the same pattern
  (``github_app_enabled`` fixture + ``count_jobs_by_name`` +
  ``mock_github.router.calls`` filter) without needing further
  helper plumbing.

Closes #271
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment