chore(personhog): remove the persons ORM fallback, gate, and middleware#65968
chore(personhog): remove the persons ORM fallback, gate, and middleware#65968nickbest-ph wants to merge 23 commits into
Conversation
👀 Auto-assigned reviewersThese soft owners were skipped because their changes are minor, or the reviewer list was getting long. Nothing blocks merge, so self-assign if you'd like a look:
Soft owners come from |
🕸️ Eager graphHow much code each root forces the browser to download and decode through static imports — the regression class total bundle size can't see.
✅ Largest files eagerly reachable from
|
| Size | File |
|---|---|
| 899.9 KiB | src/styles/global.scss |
| 609.0 KiB | public/hedgehog/burning-money-hog.png |
| 541.9 KiB | public/hedgehog/waving-hog.png |
| 448.2 KiB | public/hedgehog/stop-sign-hog.png |
| 362.0 KiB | public/hedgehog/phone-pair-hogs.png |
| 355.9 KiB | ../node_modules/.pnpm/@posthog+icons@0.37.3_react-dom@18.3.1_react@18.3.1__react@18.3.1/node_modules/@posthog/icons/dist/posthog-icons.es.js |
| 343.0 KiB | src/taxonomy/core-filter-definitions-by-group.json |
| 335.6 KiB | public/hedgehog/desk-hog.png |
| 323.2 KiB | public/hedgehog/3-bears-hogs.png |
| 298.6 KiB | src/lib/api.ts |
Largest files eagerly reachable from src/scenes/AuthenticatedShell.tsx
| Size | File |
|---|---|
| 899.9 KiB | src/styles/global.scss |
| 764.5 KiB | src/queries/validators.js |
| 609.0 KiB | public/hedgehog/burning-money-hog.png |
| 541.9 KiB | public/hedgehog/waving-hog.png |
| 448.2 KiB | public/hedgehog/stop-sign-hog.png |
| 362.0 KiB | public/hedgehog/phone-pair-hogs.png |
| 355.9 KiB | ../node_modules/.pnpm/@posthog+icons@0.37.3_react-dom@18.3.1_react@18.3.1__react@18.3.1/node_modules/@posthog/icons/dist/posthog-icons.es.js |
| 343.0 KiB | src/taxonomy/core-filter-definitions-by-group.json |
| 335.6 KiB | public/hedgehog/desk-hog.png |
| 323.2 KiB | public/hedgehog/3-bears-hogs.png |
Posted automatically by check-eager-graph · sizes are input-source bytes from the esbuild metafile · part of #32479
|
Size Change: 0 B Total Size: 64 MB ℹ️ View Unchanged
|
✅ Hobby deploy smoke test: PASSEDHobby deployment smoke test passed successfully. |
e59f1bf to
4442cf5
Compare
c7372bd to
8b9ff7b
Compare
…kflow test fake_personhog_client no longer takes gate_enabled (the gate is removed); update the delete-persons workflow test calls to the no-arg form. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01QcrQWDC88oyojMLDuyR7tM
Address review feedback: state the rule directly ("all calls must route through
the personhog gRPC service") instead of referencing the now-removed ORM fallback.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01QcrQWDC88oyojMLDuyR7tM
…rebase master's demo refactor routes persons-DB writes through persons_db_sync (raw psycopg, decoupled from Django), so the @allow_persons_orm() decorators on the demo manager are now no-ops. Also reword the stale "before falling back to ORM" retry comment since there is no fallback. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01QcrQWDC88oyojMLDuyR7tM
The identity-matching test referenced the deleted personhog_client.gate module and seeded persons via the ORM. Convert it to create_person and teach the fake's add_person to carry last_seen_at (the proto and the proto->model converter already supported it), so person resolution reads the seeded last_seen_at through personhog. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01QcrQWDC88oyojMLDuyR7tM
16b1cb5 to
b83fe62
Compare
| ) | ||
| assert response.status_code == status.HTTP_202_ACCEPTED | ||
|
|
||
| # Verify: PG distinct_id version was bumped |
There was a problem hiding this comment.
why drop the version assertions?
|
|
||
| # Tests that operate on the persons DB directly (not as person-data consumers) and | ||
| # so must NOT have the personhog fake / ORM block active. | ||
| _PERSONS_DB_DIRECT_TEST_PATHS = ( |
There was a problem hiding this comment.
this can be brittle if tests change - might be safer to use something like a @pytest.mark.persons_db_direct marker
There was a problem hiding this comment.
hmm, maybe this whole nasty block can just be removed. working on it.
There was a problem hiding this comment.
all in all lgtm! stoked to get this in. had some small comments, but gonna +1 in advance. a few things though:
- it might be worth bumping the retry count and adding UNKNOWN to our retried error types - or at least seeing how ofter we've been falling back to orm. it'd be a bummer to be returning 5xxs on pure transient issues.
- claude gets a little trigger happy with deleting test assertions and code comments, worth another pass to make sure we're not introducing any coverage regressions
|
Reviews (1): Last reviewed commit: "chore(personhog): support last_seen_at i..." | Re-trigger Greptile |
| fake = FakePersonHogClient() | ||
| set_active_fake(fake) | ||
| with ( | ||
| patch("posthog.personhog_client.client.get_personhog_client", return_value=fake), | ||
| patch("posthog.personhog_client.gate.use_personhog", return_value=True), | ||
| ): | ||
| block_persons_orm() | ||
| with patch("posthog.personhog_client.client.get_personhog_client", return_value=fake): | ||
| try: | ||
| yield fake | ||
| finally: | ||
| unblock_persons_orm() | ||
| set_active_fake(None) |
There was a problem hiding this comment.
block_persons_orm() is called before the with patch(...) scope, but unblock_persons_orm() is inside it. If patch().__enter__() ever raises (e.g. the target attribute can't be found), the thread-local ORM block will remain set for all subsequent tests on the same thread, causing spurious PersonsDBORMBlockedError failures. Moving the cleanup into an outer try/finally makes the setup and teardown symmetric.
| fake = FakePersonHogClient() | |
| set_active_fake(fake) | |
| with ( | |
| patch("posthog.personhog_client.client.get_personhog_client", return_value=fake), | |
| patch("posthog.personhog_client.gate.use_personhog", return_value=True), | |
| ): | |
| block_persons_orm() | |
| with patch("posthog.personhog_client.client.get_personhog_client", return_value=fake): | |
| try: | |
| yield fake | |
| finally: | |
| unblock_persons_orm() | |
| set_active_fake(None) | |
| fake = FakePersonHogClient() | |
| set_active_fake(fake) | |
| block_persons_orm() | |
| try: | |
| with patch("posthog.personhog_client.client.get_personhog_client", return_value=fake): | |
| yield fake | |
| finally: | |
| unblock_persons_orm() | |
| set_active_fake(None) |
| def _personhog_routed( | ||
| operation: str, | ||
| personhog_fn: Callable[[], _T], | ||
| orm_fn: Callable[[], _T], | ||
| *, | ||
| team_id: int, | ||
| ) -> _T: | ||
| """Try personhog first, fall back to ORM on failure or when disabled. | ||
| """Call personhog and record the routing metric. | ||
|
|
||
| Handles gate check, metrics, and error logging for all personhog routing. | ||
| personhog is the sole read path — there is no ORM fallback. Errors propagate | ||
| to the caller (the gRPC client handles its own transient-error retries). | ||
| """ | ||
| from posthog.personhog_client.gate import use_personhog | ||
|
|
||
| if use_personhog(): | ||
| try: | ||
| result = personhog_fn() | ||
| PERSONHOG_ROUTING_TOTAL.labels(operation=operation, source="personhog", client_name=get_client_name()).inc() | ||
| return result | ||
| except Exception: | ||
| PERSONHOG_ROUTING_ERRORS_TOTAL.labels( | ||
| operation=operation, | ||
| source="personhog", | ||
| error_type="grpc_error", | ||
| client_name=get_client_name(), | ||
| ).inc() | ||
| logger.warning("personhog_%s_failure", operation, team_id=team_id, exc_info=True) | ||
|
|
||
| PERSONHOG_ROUTING_TOTAL.labels(operation=operation, source="django_orm", client_name=get_client_name()).inc() | ||
| return orm_fn() | ||
| result = personhog_fn() | ||
| PERSONHOG_ROUTING_TOTAL.labels(operation=operation, source="personhog", client_name=get_client_name()).inc() | ||
| return result |
There was a problem hiding this comment.
Error metric gap vs. group_type_mapping path
When personhog_fn() raises, this function propagates the exception without incrementing any metric, so PERSONHOG_ROUTING_ERRORS_TOTAL is never touched for person reads. group_type_mapping._personhog_routed() still increments PERSONHOG_ROUTING_ERRORS_TOTAL on failure. The asymmetry means a personhog degradation affecting person reads is invisible to dashboards that watch that counter, while a group-type degradation is still visible.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Query snapshots: Backend query snapshots updatedChanges: 1 snapshots (1 modified, 0 added, 0 deleted) What this means:
Next steps:
|
7ad551b to
6856d6e
Compare
Query snapshots: Backend query snapshots updatedChanges: 1 snapshots (1 modified, 0 added, 0 deleted) What this means:
Next steps:
|
The endpoint issues 9 framework/auth queries on the default DB; person reads route through personhog (uncounted), so the count is unchanged from the original 9. Corrects the assertNumQueries(7) over-estimate that was failing the Core shard. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01QcrQWDC88oyojMLDuyR7tM
Problem
personhog is fully rolled out as the source of truth for person/group/cohort data. The Django ORM fallback, the rollout gate, and the per-request gate middleware are now dead weight — every read already goes through personhog. This deletes the fallback machinery and decouples the Django test suite from the persons DB so that the persons DB Django connection can be removed in a follow-up without breaking anything.
Changes
This is being landed incrementally; the intended end state of the PR:
_personhog_routed()to personhog-only inposthog/models/person/util.pyandposthog/models/group_type_mapping.py, and drop theorm_fnbranch from every caller (deletion.py,team/util.py,products/cohorts/backend/models/util.py+cohort.py,actor_base_query.py,session_recording.py). Group-type reads keep their stale-cache / fail-closed resilience — a personhog failure is re-raised asDatabaseErrorso the existing recovery still fires.gate.py), middleware (middleware.py), and their tests; remove the middleware fromMIDDLEWAREand thePERSONHOG_ENABLED/PERSONHOG_ROLLOUT_PERCENTAGEsettings, env/compose files, and the hogli devenv generator. The fakes no longer patch the gate.posthog/test/persons.pyhelpers andbulk_create_personsseed the fake (and ClickHouse) directly instead of writing to Postgres; tests that read/assert via the ORM now go through personhog helpers. (follow-up push)personhog-replica+personhog-routerbefore demo-data generation inci-e2e-playwright.ymlandci-mcp.yml, and point the e2e mprocs config at the local router.How did you test this code?
I'm an agent. Local validation was limited to
ruffandpy_compile— the full Python test deps and Docker services aren't available in my environment, so CI on this PR is the real validator and I'm iterating against it. No manual testing performed.🤖 Agent context
Autonomy: Human-driven (agent-assisted)
Authored with Claude Code, directed by the repo owner to implement PR 4 of the personhog ORM-fallback removal and drive CI to green. Notable decisions: scoped the router's hard-block to "fake active" so migrations, production, and the e2e/demo paths (which still write demo persons via the ORM) are unaffected; preserved the group-type stale-cache resilience by re-raising personhog failures as
DatabaseError; and included PR 3's CI changes because this branch is stacked on PR 2 rather than master.🤖 Generated with Claude Code
https://claude.ai/code/session_01QcrQWDC88oyojMLDuyR7tM
Generated by Claude Code