Skip to content

fix(insights): coerce aggregation_target type in multi-source funnels#58711

Draft
sampennington wants to merge 1 commit into
masterfrom
sam/fix-mixed-warehouse-funnel-actor-type
Draft

fix(insights): coerce aggregation_target type in multi-source funnels#58711
sampennington wants to merge 1 commit into
masterfrom
sam/fix-mixed-warehouse-funnel-actor-type

Conversation

@sampennington
Copy link
Copy Markdown
Contributor

Problem

A funnel that combines series from more than one source table — e.g. a regular events step plus a data-warehouse step — failed the whole query with ClickHouse Code: 386 ... no supertype for types UUID, String (NO_COMMON_TYPE). Surfaced from system.query_log as part of the effort to reduce deterministic query-builder bugs (dashboard). Closes #58710.

The funnel builds one subquery per source table and combines them with UNION ALL. Each source resolves aggregation_target to its own type — the events series uses person_id (UUID), a data-warehouse series uses its configured column (often a string). UNION ALL requires a common column type, and there is no supertype for UUID and String.

Changes

  • funnel_event_query.py: when more than one source query is unioned, coerce every branch's aggregation_target to a string (toString(...)).
  • Single-source funnels (the overwhelmingly common case) are untouched, so the person_id UUID actor id is preserved there.

How did you test this code?

I'm an agent. Automated tests run locally:

  • New test test_funnels_data_warehouse_and_regular_nodes_string_aggregation_target — a mixed events + data-warehouse funnel whose warehouse series aggregates by a plain string column. Verified it fails (NO_COMMON_TYPE) without the fix and passes with it.
  • Full test_funnel_data_warehouse.py suite — 10 passed; 2 snapshots updated (the aggregation_target columns of multi-source funnels now carry a toString(...) wrap).

Publish to changelog?

no

🤖 Agent context

Authored by Claude Code (Opus 4.7). Found via system.query_log analysis (exception_code = 386, no supertype for types UUID, String).

The coercion is gated on len(queries) > 1 — it only applies when sources are actually unioned, so it cannot regress the type of a single-source funnel's actor id. For a genuinely mixed funnel the actor id is already heterogeneous (person UUIDs and warehouse strings), so a uniform string is the correct common type; the funnel step counts are unaffected since toString is a bijection on the values.

Agent-authored; requires human review.

A funnel that combines series from multiple source tables (e.g. a regular
events step plus a data-warehouse step) builds one subquery per source and
combines them with UNION ALL. Each source resolves aggregation_target to
its own type — person_id is a UUID, a warehouse column is often a string —
and UNION ALL has no common type for UUID and String, failing the whole
query with NO_COMMON_TYPE.

Coerce every branch's aggregation_target to a string when more than one
source query is unioned. Single-source funnels (the common case) are
unchanged, so the person UUID actor id is preserved there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

🎭 Playwright didn't run on this PR — your changes touch code that could affect E2E behavior, but Playwright is opt-in via label now to keep CI cost down.

Add the run-playwright label if you want an E2E sweep before merging — CI will pick it up automatically.

Most PRs don't need this. Real regressions still get caught on master and fix-forward.

@sampennington
Copy link
Copy Markdown
Contributor Author

Automated code review

Scoped fix that resolves the NO_COMMON_TYPE (UUID vs String) failure on UNION ALL for multi-source funnels. The approach — coercing every branch's aggregation_target to toString(...) only when len(queries) > 1 — is correct and minimal, and single-source funnels keep their native UUID actor id. Nice that the test was verified to fail without the fix.

Critical Issues

  • None blocking the funnel-summary path. But see Functional Gaps below — the actors/persons drill-down for a mixed funnel is unexercised and is the real risk.

Functional Gaps

  • Actors/persons drill-down untested for mixed funnels. actor_query() (base.py L623) aliases actor_id directly from aggregation_target, so for a mixed funnel actor_id is now a String. ActorsQueryRunner.source_table_join (actors_query_runner.py L315-336) builds an INNER JOIN with ON person.id == source.actor_id. person.id is a UUID; the join key is now UUID == String. ClickHouse join-key type matching is stricter than WHERE comparison and this can re-trigger NO_COMMON_TYPE at exactly the point a user clicks into a funnel step. The new test uses just_summarize=True, which never runs the actors query — so the riskiest path is not covered. Please add a test that runs the funnel actors/persons query for a mixed events + warehouse funnel (no just_summarize), or confirm explicitly that the drill-down is unsupported for mixed funnels and short-circuits before that join.
  • Even on the happy path, a person UUID stringified and then INNER-joined to a UUID column relies on implicit coercion. If the join silently coerces one side, formatting differences (UUID canonical form vs toString output) could drop matches. Worth an assertion on actual actor results, not just results is not None.

Convention Violations

  • None. structlog/pytest conventions n/a; assertion style matches the file.

Performance Notes

  • toString wrapping the aggregation_target defeats any UUID-typed index/partition pruning on the events subquery, but this only affects genuinely mixed funnels (rare) and the cast is on an already-computed expression — negligible. No action needed.

Improvements Suggested

  • funnel_event_query.py L164-168: the nested for loop assumes exactly one aggregation_target alias per query and silently no-ops if the alias is ever renamed/refactored. Consider asserting it patched one per query, or extract a small _coerce_aggregation_target_to_string(query) helper with that invariant, so a future column-name change fails loudly instead of silently regressing to NO_COMMON_TYPE.
  • Test comment is good; the # A mixed funnel... block explains the why well — keep it.

Positive Observations

  • Correct gating on len(queries) > 1 keeps the common single-source path byte-identical.
  • Snapshot changes look correct and surgical: exactly the four aggregation_target select expressions across the two multi-source snapshots are now toString(...)-wrapped, every other column untouched. toString is a bijection over the id values so step counts are unaffected.
  • Reproducing test verified red-before/green-after.

Overall Assessment

Request Changes — the funnel-summary fix is sound, but add (or explicitly rule out) coverage for the mixed-funnel actors/persons drill-down before merge, since the UUID == String INNER JOIN in source_table_join is the next place this same error class can surface.

Verdict: Needs changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mixed data-warehouse + events funnel fails with NO_COMMON_TYPE (UUID, String)

1 participant