Skip to content

Pushdown percentile_cont() as quantile()#36

Merged
theory merged 1 commit into
mainfrom
push-percentile_cond
Nov 11, 2025
Merged

Pushdown percentile_cont() as quantile()#36
theory merged 1 commit into
mainfrom
push-percentile_cond

Conversation

@theory
Copy link
Copy Markdown
Collaborator

@theory theory commented Nov 10, 2025

Detect use of percentile_cont(), an ordered set aggregate, and push it down to ClickHouse as quantile(). Like quantile()'s parameters, the argument to percentile_cont() is evaluates only once, but the arguments to WITHIN GROUP (ORDER BY) are evaluated for each row.

In examples seen in the wild, including HouseClick and this blog post, this appears to be a valid mapping, as long as there is no DESC, USING >, or NULLS FIRST.

In the future we should be able to map percentile_cont(float[]) to quantiles(), though it will require converting the array argument to a parameter list. It might also make sense to map percentile_disc to quantileExactHigh().

To implement this syntax conversion, deparseAggref() borrows from postgres_fdw to output the arguments to the aggregate as a parameter list and then the WITHIN GROUP (ORDER) as the normal parameter arguments. Thus this PostgreSQL query:

SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY a) FROM t1;

Maps to this ClickHouse query:

SELECT quantile(0.25)(a) FROM t1;

The non-default ORDER BY suffixes DESC and NULLS FIRST are not supported and will raise an error.

For normal aggregates it keeps the previous behavior.

Detect use of `percentile_cont()`, an ordered set aggregate, and push it
down to ClickHouse as `quantile()`. Like `quantile()`'s parameters, the
argument to `percentile_cont()` is evaluates only once, but the
arguments to `WITHIN GROUP (ORDER BY)` are evaluated for each row.

In examples seen in the wild, including [HouseClick] and [this blog
post], this appears to be a valid mapping, as long as there is no
`DESC`, `USING >`, or `NULLS FIRST`.

In the future we should be able to map `percentile_cont(float[])` to
`quantiles()`, though it will require converting the array argument to a
parameter list. It might also make sense to map `percentile_disc` to
`quantileExactHigh()`.

To implement this syntax conversion, `deparseAggref()` borrows from
`postgres_fdw` to output the arguments to the aggregate as a parameter
list and then the `WITHIN GROUP (ORDER)` as the normal parameter
arguments. Thus this PostgreSQL query:

```sql
SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY a) FROM t1;
```

Maps to this ClickHouse query:

```sql
SELECT quantile(0.25)(a) FROM t1;
```

The non-default `ORDER BY` suffixes `DESC` and `NULLS FIRST` are not
supported and will raise an error.

For normal aggregates it keeps the previous behavior.

  [HouseClick]: https://github.com/ClickHouse/HouseClick/blob/fa449b2/app/lib/analytics_queries.ts
  [this blog post]: https://clickhouse.com/blog/redshift-vs-clickhouse-comparison#ethereum-gas-used-by-week
    "Optimizing Analytical Workloads: Comparing Redshift vs ClickHouse"
@theory theory requested a review from serprex November 10, 2025 22:20
@theory theory self-assigned this Nov 10, 2025
@theory theory merged commit 087cfdc into main Nov 11, 2025
32 checks passed
@theory theory deleted the push-percentile_cond branch November 11, 2025 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants