fix(insights): coerce histogram breakdown property to a number#58701
fix(insights): coerce histogram breakdown property to a number#58701sampennington wants to merge 1 commit into
Conversation
A trends histogram breakdown computes bin widths with max - min over the breakdown values. When the breakdown property is not numeric-typed (so the property-type swapper leaves it as a string), this ran minus() on strings and ClickHouse failed the whole query with ILLEGAL_TYPE_OF_ARGUMENT. - Wrap the histogram breakdown column in toFloat(). - Coerce the breakdown property the same way in the bin-bounds filter so the actors drill-in comparison stays numeric. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🎭 Playwright didn't run on this PR — your changes touch code that could affect E2E behavior, but Playwright is opt-in via label now to keep CI cost down. Add the Most PRs don't need this. Real regressions still get caught on master and fix-forward. |
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
posthog/hogql_queries/insights/trends/test/test_trends_query_runner.py:2053-2085
**Actors drill-in path not covered by regression test**
The PR adds a `toFloat()` coercion to `_get_actors_query_where_expr` (the actors drill-in filter), but the new test only exercises the main trends query. Without a matching test that calls `to_actors_query_options()` or resolves the actors query with a `str_amount`-bucketed breakdown, the actors filter change (`toFloat(left) >= gte AND toFloat(left) < lt`) is untested. Before this PR the actors path would have crashed in the same way as the main query; a test covering that path would confirm the fix holds end-to-end (see `test_to_actors_query_options_breakdowns_histogram` for the numeric-typed counterpart at line ~3250).
### Issue 2 of 2
posthog/hogql_queries/insights/trends/test/test_trends_query_runner.py:2053-2085
**Single-case test where parameterisation would add value**
The PR description explicitly states that `toFloat()` on an already-numeric column is "a harmless no-op." That claim is not exercised by any test on this path — the existing histogram tests all use integer-valued properties (registered as `Numeric`) and live in separate, unrelated test methods. Parameterising this test to also run the already-numeric case (e.g., `breakdown_prop=10` as an integer) would assert the no-op claim directly and align with the team's preference for parameterised tests.
Reviews (1): Last reviewed commit: "fix(insights): coerce histogram breakdow..." | Re-trigger Greptile |
| def test_trends_histogram_breakdown_on_string_typed_property(self): | ||
| # Numeric-looking values stored as strings register the property as String, | ||
| # so the property-type swapper does not coerce it. The histogram bin math | ||
| # (max - min) must still work rather than raising ILLEGAL_TYPE_OF_ARGUMENT. | ||
| self._create_events( | ||
| [ | ||
| SeriesTestData( | ||
| distinct_id="p1", | ||
| events=[Series(event="$pageview", timestamps=["2020-01-11T12:00:00Z"])], | ||
| properties={"str_amount": "10"}, | ||
| ), | ||
| SeriesTestData( | ||
| distinct_id="p2", | ||
| events=[Series(event="$pageview", timestamps=["2020-01-12T12:00:00Z"])], | ||
| properties={"str_amount": "40"}, | ||
| ), | ||
| ] | ||
| ) | ||
|
|
||
| response = self._run_trends_query( | ||
| "2020-01-11", | ||
| "2020-01-13", | ||
| IntervalType.DAY, | ||
| [EventsNode(event="$pageview")], | ||
| None, | ||
| BreakdownFilter( | ||
| breakdown_type=BreakdownType.EVENT, | ||
| breakdown="str_amount", | ||
| breakdown_histogram_bin_count=2, | ||
| ), | ||
| ) | ||
|
|
||
| assert {r["breakdown_value"] for r in response.results} == {"[10,25]", "[25,40.01]"} |
There was a problem hiding this comment.
Actors drill-in path not covered by regression test
The PR adds a toFloat() coercion to _get_actors_query_where_expr (the actors drill-in filter), but the new test only exercises the main trends query. Without a matching test that calls to_actors_query_options() or resolves the actors query with a str_amount-bucketed breakdown, the actors filter change (toFloat(left) >= gte AND toFloat(left) < lt) is untested. Before this PR the actors path would have crashed in the same way as the main query; a test covering that path would confirm the fix holds end-to-end (see test_to_actors_query_options_breakdowns_histogram for the numeric-typed counterpart at line ~3250).
Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/hogql_queries/insights/trends/test/test_trends_query_runner.py
Line: 2053-2085
Comment:
**Actors drill-in path not covered by regression test**
The PR adds a `toFloat()` coercion to `_get_actors_query_where_expr` (the actors drill-in filter), but the new test only exercises the main trends query. Without a matching test that calls `to_actors_query_options()` or resolves the actors query with a `str_amount`-bucketed breakdown, the actors filter change (`toFloat(left) >= gte AND toFloat(left) < lt`) is untested. Before this PR the actors path would have crashed in the same way as the main query; a test covering that path would confirm the fix holds end-to-end (see `test_to_actors_query_options_breakdowns_histogram` for the numeric-typed counterpart at line ~3250).
How can I resolve this? If you propose a fix, please make it concise.| def test_trends_histogram_breakdown_on_string_typed_property(self): | ||
| # Numeric-looking values stored as strings register the property as String, | ||
| # so the property-type swapper does not coerce it. The histogram bin math | ||
| # (max - min) must still work rather than raising ILLEGAL_TYPE_OF_ARGUMENT. | ||
| self._create_events( | ||
| [ | ||
| SeriesTestData( | ||
| distinct_id="p1", | ||
| events=[Series(event="$pageview", timestamps=["2020-01-11T12:00:00Z"])], | ||
| properties={"str_amount": "10"}, | ||
| ), | ||
| SeriesTestData( | ||
| distinct_id="p2", | ||
| events=[Series(event="$pageview", timestamps=["2020-01-12T12:00:00Z"])], | ||
| properties={"str_amount": "40"}, | ||
| ), | ||
| ] | ||
| ) | ||
|
|
||
| response = self._run_trends_query( | ||
| "2020-01-11", | ||
| "2020-01-13", | ||
| IntervalType.DAY, | ||
| [EventsNode(event="$pageview")], | ||
| None, | ||
| BreakdownFilter( | ||
| breakdown_type=BreakdownType.EVENT, | ||
| breakdown="str_amount", | ||
| breakdown_histogram_bin_count=2, | ||
| ), | ||
| ) | ||
|
|
||
| assert {r["breakdown_value"] for r in response.results} == {"[10,25]", "[25,40.01]"} |
There was a problem hiding this comment.
Single-case test where parameterisation would add value
The PR description explicitly states that toFloat() on an already-numeric column is "a harmless no-op." That claim is not exercised by any test on this path — the existing histogram tests all use integer-valued properties (registered as Numeric) and live in separate, unrelated test methods. Parameterising this test to also run the already-numeric case (e.g., breakdown_prop=10 as an integer) would assert the no-op claim directly and align with the team's preference for parameterised tests.
Context Used: Do not attempt to comment on incorrect alphabetica... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/hogql_queries/insights/trends/test/test_trends_query_runner.py
Line: 2053-2085
Comment:
**Single-case test where parameterisation would add value**
The PR description explicitly states that `toFloat()` on an already-numeric column is "a harmless no-op." That claim is not exercised by any test on this path — the existing histogram tests all use integer-valued properties (registered as `Numeric`) and live in separate, unrelated test methods. Parameterising this test to also run the already-numeric case (e.g., `breakdown_prop=10` as an integer) would assert the no-op claim directly and align with the team's preference for parameterised tests.
**Context Used:** Do not attempt to comment on incorrect alphabetica... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0))
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Automated code reviewScope: Needs discussion — non-numeric strings still crash
Histogram breakdown is an opt-in numeric feature, so a fully non-numeric property is arguably user error — but a single stray non-numeric event value among thousands of numeric ones would now fail the entire query, which is a worse failure mode than today for that mixed-data case. Worth confirming the intended behavior:
My recommendation: switch both call sites to Functional gaps in test coverage
Minor
Positive
Verdict: Needs changes — prefer |
Problem
A trends insight with a histogram breakdown crashed the whole query with ClickHouse
Illegal types String and String of arguments of function minuswhen the breakdown property wasn't numeric-typed. Surfaced fromsystem.query_logas part of the effort to reduce deterministic query-builder bugs (dashboard).The histogram bin math computes bin widths as
max - minover the breakdown values. The histogram breakdown column was the raw property field, and a property without aNumericPropertyDefinition isn't coerced by the property-type swapper — so the bin math ranminus()on strings.Changes
_get_breakdown_col_expr) intoFloat()._get_breakdown_expr) so the actors drill-in>= / <comparison stays numeric.How did you test this code?
I'm an agent. Automated tests run locally:
test_trends_histogram_breakdown_on_string_typed_property(numeric values stored as strings → String-typed property), reproducing the crash and asserting correct buckets.test_trends_query_runner.pyhistogram + breakdown tests — 89 passed.Publish to changelog?
no
🤖 Agent context
Authored by Claude Code (Opus 4.7). Found via
system.query_loganalysis (exception_code = 43,minuson String).The existing histogram tests passed only because the test helper auto-registers integer-valued properties as
Numeric(so the property-type swapper coerced them) — the bug was masked. The new test stores numeric values as strings to exercise the uncoerced path.toFloat()of an already-numeric column is a harmless no-op, so numeric-typed breakdowns are unaffected.Agent-authored; requires human review.