Skip to content

perf(cohorts): cache behavioral cohort exclusion for flag list#65602

Open
gustavohstrassburger wants to merge 8 commits into
masterfrom
posthog-code/cache-behavioral-cohort-list
Open

perf(cohorts): cache behavioral cohort exclusion for flag list#65602
gustavohstrassburger wants to merge 8 commits into
masterfrom
posthog-code/cache-behavioral-cohort-list

Conversation

@gustavohstrassburger

Copy link
Copy Markdown
Contributor

Problem

The cohorts list endpoint (GET /api/projects/:id/cohorts/?hide_behavioral_cohorts=true) could take several seconds server-side. The feature flag UI's cohort typeahead hits this endpoint on every keystroke, so the flag cohort picker felt broken — the list never populated fast enough to select a cohort.

The cost wasn't search or indexing. On every request the endpoint rebuilt the behavioral-cohort dependency graph: it loaded every non-deleted cohort for the team (with its filters JSONB) into memory, parsed all that JSON, and walked the graph — ignoring pagination and the search/type filters. Teams with many or large cohorts paid the full-team cost on each keystroke.

Changes

  • Trim the graph load. Only cohorts that can matter to the graph are loaded: non-static cohorts whose filters reference a behavioral node or another cohort (via a SQL prefilter). Leaf cohorts with plain person-property filters are skipped. The bare-word match has no false negatives (those node types always serialize the literal substrings); a false positive just loads one extra leaf the walk ignores.
  • Cache the result. The behavioral (flag-incompatible) cohort set is computed once per team and cached (1h TTL backstop), keyed on allow_realtime_backfilled. Typeahead keystrokes reuse it instead of rebuilding the graph. The cache is invalidated on any cohort write/delete through the existing signal hooks in dependencies.py. Flag-save validation remains the real safety net for the brief staleness window.
  • The pure graph functions moved out of the viewset into products/cohorts/backend/models/dependencies.py, next to the existing cohort-dependency cache and invalidation infrastructure. The viewset now just calls the cached helper.

How did you test this code?

I'm an agent (PostHog Code), so no manual testing. Automated tests run locally:

  • New TestFlagExcludedBehavioralCohortIds suite in test_dependencies.py — trim correctness, static-cohort exemption, caching, None normalization of allow_realtime_backfilled, and cache invalidation on cohort change.
  • Existing behavioral-cohort endpoint tests and the moved find_behavioral_cohorts graph test continue to pass (23 relevant tests green).
  • ruff and ty check clean (via lint-staged on commit).

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

🤖 Agent context

Autonomy: Human-driven (agent-assisted)

Diagnosed from a DevTools waterfall showing ~9s TTFB on the cohorts list endpoint with hide_behavioral_cohorts=true. Traced the cost to the per-request full-team graph build (not search/indexing or the local feature-flag eval). Considered three fixes — caching only, trimming the load only, or both — and went with both: the trim shrinks each computation, the cache reuses it across the typeahead burst.

Chose a 1h TTL plus signal-based invalidation (reusing the existing dependencies.py cohort-cache hooks) rather than a longer TTL, since cohort_type clears during recalculation go through queryset .update() and don't fire signals; the flag-save validation backstops any stale window. No skills were required for this change.

@greptile


Created with PostHog Code

Generated-By: PostHog Code
Task-Id: b094aee2-93a9-4790-bb1a-c5c4670f2d49

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves performance of the cohorts list endpoint when hide_behavioral_cohorts=true (used by the feature flag cohort typeahead) by avoiding per-request full-team dependency graph rebuilds and reusing a cached per-team set of flag-incompatible (behavioral) cohorts.

Changes:

  • Added a cached helper in products/cohorts/backend/models/dependencies.py to compute and cache flag-excluded behavioral cohort IDs per team (with SQL prefiltering to trim the graph input).
  • Updated the cohorts list queryset logic to call the cached helper instead of rebuilding the dependency graph in the viewset.
  • Updated and added tests to cover the extracted graph logic and the new cache/invalidation behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
products/workflows/backend/api/hog_flow.py Updates an internal comment reference to the new dependency-graph location.
products/cohorts/backend/models/test/test_dependencies.py Adds a new test suite for cached flag-excluded behavioral cohort IDs and invalidation behavior.
products/cohorts/backend/models/dependencies.py Introduces cached computation + invalidation hooks for behavioral cohort exclusion, and hosts the extracted graph-walk helpers.
posthog/api/test/test_cohort.py Updates graph-walk tests to use find_behavioral_cohorts from the new module (removes viewset dependency).
posthog/api/cohort.py Switches list filtering for hide_behavioral_cohorts to the cached helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread products/cohorts/backend/models/dependencies.py Outdated
@greptile-apps

greptile-apps Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Reviews (1): Last reviewed commit: "perf(cohorts): cache behavioral cohort e..." | Re-trigger Greptile

@gustavohstrassburger gustavohstrassburger marked this pull request as ready for review June 25, 2026 17:00
@assign-reviewers-posthog assign-reviewers-posthog Bot requested a review from a team June 25, 2026 17:01
@assign-reviewers-posthog

Copy link
Copy Markdown

👀 Auto-assigned reviewers

These soft owners were skipped because they only have minor changes here. Nothing blocks merge, so self-assign if you'd like a look:

  • @PostHog/team-workflows (products/workflows/**)

Soft owners come from CODEOWNERS-soft and each product's product.yaml. Generated files and lockfiles are ignored when deciding ownership.

@posthog-project-board-bot posthog-project-board-bot Bot moved this to In Review in Feature Flags Jun 25, 2026
@greptile-apps

greptile-apps Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Reviews (2): Last reviewed commit: "test(cohorts): improve behavioral cohort..." | Re-trigger Greptile

@haacked haacked left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean perf win. No blocking issues; a couple of test-coverage suggestions inline.

set(cache.get(_behavioral_cohort_ids_key(self.team.id, allow_realtime_backfilled=False))), excluded
)

def test_cache_invalidated_when_cohort_changes(self) -> None:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Only the save path has a test for behavioral-cache invalidation. cohort_deleted and the Team post_delete handler both call _invalidate_team_behavioral_cohort_cache (dependencies.py:544), but nothing deletes a cohort and re-checks, so a regression there would leave a behavioral cohort hidden from the flag picker after its dependency is deleted, with no test to catch it.

Mirror test_cache_invalidated_when_cohort_changes with a behavioral.delete(): prime both cache keys, delete, then assert both are None.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a7a1a27 — added test_cache_invalidated_when_cohort_deleted: primes both cache-key variants, hard-deletes the behavioral cohort, asserts both are None.

with mock.patch("django.db.transaction.on_commit", side_effect=lambda func: func()):
yield

def test_excludes_behavioral_and_transitive_referrers_but_not_leaves(self) -> None:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Every test here uses the flat BEHAVIORAL_FILTERS (a top-level OR with one value), so the recursion into nested AND/OR groups in check_property_values never runs. A behavioral node buried a group deeper still has to be detected: the prefilter's icontains catches it, then the walk has to recurse into the group to mark it.

Add a case mirroring this test with the behavioral value nested inside an inner AND/OR group. (A two-hop referrer chain, A → referrer → behavioral, would also exercise the reverse-walk traversal instead of a single edge lookup.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a7a1a27 — added test_detects_behavioral_nested_in_group_and_two_hop_referrers: behavioral node buried in an inner AND group (exercises the check_property_values recursion) plus a two-hop chain hop2 → hop1 → behavioral (exercises the reverse-walk traversal).

@github-project-automation github-project-automation Bot moved this from In Review to Approved in Feature Flags Jun 25, 2026
…multi-hop graph

Generated-By: PostHog Code
Task-Id: b094aee2-93a9-4790-bb1a-c5c4670f2d49
@gustavohstrassburger gustavohstrassburger enabled auto-merge (squash) June 25, 2026 22:40
@tests-posthog tests-posthog Bot disabled auto-merge June 25, 2026 22:48
@tests-posthog

tests-posthog Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Query snapshots: Backend query snapshots updated

Changes: 1 snapshots (1 modified, 0 added, 0 deleted)

What this means:

  • Query snapshots have been automatically updated to match current output
  • These changes reflect modifications to database queries or schema

Next steps:

  • Review the query changes to ensure they're intentional
  • If unexpected, investigate what caused the query to change

Review snapshot changes →

@tests-posthog

tests-posthog Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Query snapshots: Backend query snapshots updated

Changes: 1 snapshots (1 modified, 0 added, 0 deleted)

What this means:

  • Query snapshots have been automatically updated to match current output
  • These changes reflect modifications to database queries or schema

Next steps:

  • Review the query changes to ensure they're intentional
  • If unexpected, investigate what caused the query to change

Review snapshot changes →

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Approved

Development

Successfully merging this pull request may close these issues.

3 participants