fix(insights): coerce histogram breakdown property to a number by sampennington · Pull Request #58701 · PostHog/posthog

sampennington · 2026-05-17T12:38:49Z

Problem

A trends insight with a histogram breakdown crashed the whole query with ClickHouse Illegal types String and String of arguments of function minus when the breakdown property wasn't numeric-typed. Surfaced from system.query_log as part of the effort to reduce deterministic query-builder bugs (dashboard).

The histogram bin math computes bin widths as max - min over the breakdown values. The histogram breakdown column was the raw property field, and a property without a Numeric PropertyDefinition isn't coerced by the property-type swapper — so the bin math ran minus() on strings.

Changes

Wrap the histogram breakdown column (_get_breakdown_col_expr) in toFloat().
Coerce the breakdown property the same way in the bin-bounds filter (_get_breakdown_expr) so the actors drill-in >= / < comparison stays numeric.

How did you test this code?

I'm an agent. Automated tests run locally:

New regression test test_trends_histogram_breakdown_on_string_typed_property (numeric values stored as strings → String-typed property), reproducing the crash and asserting correct buckets.
test_trends_query_runner.py histogram + breakdown tests — 89 passed.

Publish to changelog?

no

🤖 Agent context

Authored by Claude Code (Opus 4.7). Found via system.query_log analysis (exception_code = 43, minus on String).

The existing histogram tests passed only because the test helper auto-registers integer-valued properties as Numeric (so the property-type swapper coerced them) — the bug was masked. The new test stores numeric values as strings to exercise the uncoerced path. toFloat() of an already-numeric column is a harmless no-op, so numeric-typed breakdowns are unaffected.

Agent-authored; requires human review.

A trends histogram breakdown computes bin widths with max - min over the breakdown values. When the breakdown property is not numeric-typed (so the property-type swapper leaves it as a string), this ran minus() on strings and ClickHouse failed the whole query with ILLEGAL_TYPE_OF_ARGUMENT. - Wrap the histogram breakdown column in toFloat(). - Coerce the breakdown property the same way in the bin-bounds filter so the actors drill-in comparison stays numeric. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-17T12:39:30Z

🎭 Playwright didn't run on this PR — your changes touch code that could affect E2E behavior, but Playwright is opt-in via label now to keep CI cost down.

Add the run-playwright label if you want an E2E sweep before merging — CI will pick it up automatically.

Most PRs don't need this. Real regressions still get caught on master and fix-forward.

greptile-apps · 2026-05-17T13:12:55Z

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
posthog/hogql_queries/insights/trends/test/test_trends_query_runner.py:2053-2085
**Actors drill-in path not covered by regression test**

The PR adds a `toFloat()` coercion to `_get_actors_query_where_expr` (the actors drill-in filter), but the new test only exercises the main trends query. Without a matching test that calls `to_actors_query_options()` or resolves the actors query with a `str_amount`-bucketed breakdown, the actors filter change (`toFloat(left) >= gte AND toFloat(left) < lt`) is untested. Before this PR the actors path would have crashed in the same way as the main query; a test covering that path would confirm the fix holds end-to-end (see `test_to_actors_query_options_breakdowns_histogram` for the numeric-typed counterpart at line ~3250).

### Issue 2 of 2
posthog/hogql_queries/insights/trends/test/test_trends_query_runner.py:2053-2085
**Single-case test where parameterisation would add value**

The PR description explicitly states that `toFloat()` on an already-numeric column is "a harmless no-op." That claim is not exercised by any test on this path — the existing histogram tests all use integer-valued properties (registered as `Numeric`) and live in separate, unrelated test methods. Parameterising this test to also run the already-numeric case (e.g., `breakdown_prop=10` as an integer) would assert the no-op claim directly and align with the team's preference for parameterised tests.

_{Reviews (1): Last reviewed commit: "fix(insights): coerce histogram breakdow..." | Re-trigger Greptile}

greptile-apps · 2026-05-17T13:12:59Z

+    def test_trends_histogram_breakdown_on_string_typed_property(self):
+        # Numeric-looking values stored as strings register the property as String,
+        # so the property-type swapper does not coerce it. The histogram bin math
+        # (max - min) must still work rather than raising ILLEGAL_TYPE_OF_ARGUMENT.
+        self._create_events(
+            [
+                SeriesTestData(
+                    distinct_id="p1",
+                    events=[Series(event="$pageview", timestamps=["2020-01-11T12:00:00Z"])],
+                    properties={"str_amount": "10"},
+                ),
+                SeriesTestData(
+                    distinct_id="p2",
+                    events=[Series(event="$pageview", timestamps=["2020-01-12T12:00:00Z"])],
+                    properties={"str_amount": "40"},
+                ),
+            ]
+        )
+
+        response = self._run_trends_query(
+            "2020-01-11",
+            "2020-01-13",
+            IntervalType.DAY,
+            [EventsNode(event="$pageview")],
+            None,
+            BreakdownFilter(
+                breakdown_type=BreakdownType.EVENT,
+                breakdown="str_amount",
+                breakdown_histogram_bin_count=2,
+            ),
+        )
+
+        assert {r["breakdown_value"] for r in response.results} == {"[10,25]", "[25,40.01]"}


Actors drill-in path not covered by regression test

The PR adds a toFloat() coercion to _get_actors_query_where_expr (the actors drill-in filter), but the new test only exercises the main trends query. Without a matching test that calls to_actors_query_options() or resolves the actors query with a str_amount-bucketed breakdown, the actors filter change (toFloat(left) >= gte AND toFloat(left) < lt) is untested. Before this PR the actors path would have crashed in the same way as the main query; a test covering that path would confirm the fix holds end-to-end (see test_to_actors_query_options_breakdowns_histogram for the numeric-typed counterpart at line ~3250).

Prompt To Fix With AI

This is a comment left during a code review. Path: posthog/hogql_queries/insights/trends/test/test_trends_query_runner.py Line: 2053-2085 Comment: **Actors drill-in path not covered by regression test** The PR adds a `toFloat()` coercion to `_get_actors_query_where_expr` (the actors drill-in filter), but the new test only exercises the main trends query. Without a matching test that calls `to_actors_query_options()` or resolves the actors query with a `str_amount`-bucketed breakdown, the actors filter change (`toFloat(left) >= gte AND toFloat(left) < lt`) is untested. Before this PR the actors path would have crashed in the same way as the main query; a test covering that path would confirm the fix holds end-to-end (see `test_to_actors_query_options_breakdowns_histogram` for the numeric-typed counterpart at line ~3250). How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-17T13:12:59Z

+    def test_trends_histogram_breakdown_on_string_typed_property(self):
+        # Numeric-looking values stored as strings register the property as String,
+        # so the property-type swapper does not coerce it. The histogram bin math
+        # (max - min) must still work rather than raising ILLEGAL_TYPE_OF_ARGUMENT.
+        self._create_events(
+            [
+                SeriesTestData(
+                    distinct_id="p1",
+                    events=[Series(event="$pageview", timestamps=["2020-01-11T12:00:00Z"])],
+                    properties={"str_amount": "10"},
+                ),
+                SeriesTestData(
+                    distinct_id="p2",
+                    events=[Series(event="$pageview", timestamps=["2020-01-12T12:00:00Z"])],
+                    properties={"str_amount": "40"},
+                ),
+            ]
+        )
+
+        response = self._run_trends_query(
+            "2020-01-11",
+            "2020-01-13",
+            IntervalType.DAY,
+            [EventsNode(event="$pageview")],
+            None,
+            BreakdownFilter(
+                breakdown_type=BreakdownType.EVENT,
+                breakdown="str_amount",
+                breakdown_histogram_bin_count=2,
+            ),
+        )
+
+        assert {r["breakdown_value"] for r in response.results} == {"[10,25]", "[25,40.01]"}


Single-case test where parameterisation would add value

The PR description explicitly states that toFloat() on an already-numeric column is "a harmless no-op." That claim is not exercised by any test on this path — the existing histogram tests all use integer-valued properties (registered as Numeric) and live in separate, unrelated test methods. Parameterising this test to also run the already-numeric case (e.g., breakdown_prop=10 as an integer) would assert the no-op claim directly and align with the team's preference for parameterised tests.

Context Used: Do not attempt to comment on incorrect alphabetica... (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: posthog/hogql_queries/insights/trends/test/test_trends_query_runner.py Line: 2053-2085 Comment: **Single-case test where parameterisation would add value** The PR description explicitly states that `toFloat()` on an already-numeric column is "a harmless no-op." That claim is not exercised by any test on this path — the existing histogram tests all use integer-valued properties (registered as `Numeric`) and live in separate, unrelated test methods. Parameterising this test to also run the already-numeric case (e.g., `breakdown_prop=10` as an integer) would assert the no-op claim directly and align with the team's preference for parameterised tests. **Context Used:** Do not attempt to comment on incorrect alphabetica... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0)) How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

sampennington · 2026-05-17T14:55:03Z

Automated code review

Scope: posthog/hogql_queries/insights/trends/breakdown.py (+9/-1) and one new regression test. The fix and its rationale are sound — a histogram breakdown on a String-typed property ran minus() on strings and crashed with ILLEGAL_TYPE_OF_ARGUMENT. Coercing both the breakdown column and the actors-query bounds filter to a number with toFloat() is the right shape, and keeping the two sites symmetric (_get_breakdown_col_expr ↔ _get_actors_query_where_expr) is good.

Needs discussion — non-numeric strings still crash

breakdown.py:433 and :323 use toFloat(), which compiles to ClickHouse toFloat64(). That function throws (CANNOT_PARSE_TEXT / CANNOT_PARSE_NUMBER, exception code 6/72) on a value it can't parse — empty string, null-ish text, or genuinely non-numeric values like "abc". So this PR fixes the numeric-looking string case but converts a different population of inputs from one hard error into another.

Histogram breakdown is an opt-in numeric feature, so a fully non-numeric property is arguably user error — but a single stray non-numeric event value among thousands of numeric ones would now fail the entire query, which is a worse failure mode than today for that mixed-data case. Worth confirming the intended behavior:

If non-numeric rows should be dropped/bucketed-as-null, use toFloatOrNull() instead. The bin math (max - min, width_bucket-style) tolerates NULL, and toFloatOrNull of an already-numeric column is still a harmless no-op. The bounds filter at :323 would also behave correctly (NULL >= gte → false, row excluded).
If a hard error is genuinely desired for non-numeric data, that's defensible — but then it should be a validated, user-facing error (like the breakdown_histogram_bin_count math-type check already in this file) rather than a raw ClickHouse exception surfaced from system.query_log, which is exactly the class of bug this effort is trying to reduce.

My recommendation: switch both call sites to toFloatOrNull(). It strictly widens the set of inputs that succeed and removes the new crash surface, at no cost to the numeric path.

Functional gaps in test coverage

test_trends_histogram_breakdown_on_string_typed_property only covers cleanly-parseable numeric strings. Given the concern above, add cases for:

A property value that is non-numeric (e.g. "abc") or empty string — asserts the chosen behavior (graceful bucket vs. explicit error), and pins it so a future change can't silently regress it.
A mixed dataset (some numeric strings, some non-numeric) — the realistic system.query_log scenario.
A property value that is missing entirely on some events (NULL column) — confirms null handling through the histogram path.
Parameterize these per the repo's test conventions.

Minor

breakdown.py:307 is_numeric_breakdown is derived only when histogram_bin_count is an int, while _get_breakdown_col_expr:427 keys off histogram_bin_count is not None. Same effective condition today, but the two sites that must stay in lock-step now express the predicate differently — a one-line shared helper or matching the check would make the symmetry the PR depends on explicit.
The new comments are clear and appropriately explain why; no narration. Good.

Positive

Correct root-cause diagnosis (uncoerced property-type swapper path) and a genuine regression test that fails without the fix — the PR body's note that existing tests masked the bug via auto-registered Numeric PropertyDefinitions is a useful catch.
Both the aggregation column and the actors drill-in filter are fixed together, so the breakdown and its person-level drill-in stay consistent.

Verdict: Needs changes — prefer toFloatOrNull() at both sites (or an explicit validated error) plus non-numeric/mixed/null test cases before merge.

sampennington marked this pull request as ready for review May 17, 2026 13:09

greptile-apps Bot reviewed May 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(insights): coerce histogram breakdown property to a number#58701

fix(insights): coerce histogram breakdown property to a number#58701
sampennington wants to merge 1 commit into
masterfrom
sam/fix-histogram-breakdown-string-property

sampennington commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

greptile-apps Bot commented May 17, 2026

Uh oh!

greptile-apps Bot May 17, 2026

Uh oh!

greptile-apps Bot May 17, 2026

Uh oh!

sampennington commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sampennington commented May 17, 2026

Problem

Changes

How did you test this code?

Publish to changelog?

🤖 Agent context

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

greptile-apps Bot commented May 17, 2026

Uh oh!

greptile-apps Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

sampennington commented May 17, 2026

Automated code review

Needs discussion — non-numeric strings still crash

Functional gaps in test coverage

Minor

Positive

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant