Skip to content

PromQL: Binary comparison on histogram_quantile() result fails with DataFusion field resolution error #8144

@WRAllen

Description

@WRAllen

What type of bug is this?

Unexpected error

What subsystems are affected?

Frontend

Minimal reproduce step

Description

Binary comparison operators (>=, >, <, etc.) and arithmetic operators (+, -) applied to histogram_quantile()
results fail with a DataFusion internal error. The histogram_quantile() function itself evaluates correctly, but any
binary operation on its output triggers a field resolution failure.

Environment

  • GreptimeDB version: v1.0.0-beta.4
  • Query interface: Prometheus-compatible API (/v1/prometheus/api/v1/query)
  • Client: vmalert v1.102.0

Steps to Reproduce

Execute the following PromQL queries in order:

# Step 1 - Works ✅
rate(inference_time_per_output_token_seconds_bucket[1m])

# Step 2 - Works ✅
sum by (le, infr_svc_uid) (rate(inference_time_per_output_token_seconds_bucket[1m]))

# Step 3 - Works ✅
histogram_quantile(0.5, sum by (le, infr_svc_uid) (rate(inference_time_per_output_token_seconds_bucket[1m])))

# Step 4 - Fails ❌
histogram_quantile(0.5, sum by (le, infr_svc_uid) (rate(inference_time_per_output_token_seconds_bucket[1m]))) >= 0.02

# Step 5 - Also fails ❌
histogram_quantile(0.5, sum by (le, infr_svc_uid) (rate(inference_time_per_output_token_seconds_bucket[1m]))) + 0

Error Messages

  Step 4 & 5 (direct comparison/arithmetic):
  {
    "status": "error",
    "error": "Internal error during building DataFusion plan: No field named
  \"sum(prom_rate(time_range,value,time,Int64(60000)))\".",
    "errorType": "PlanQuery"
  }

When wrapped in a subquery (via vmalert):

  count_over_time((
    histogram_quantile(0.5, sum by (le, infr_svc_uid) (rate(inference_time_per_output_token_seconds_bucket[1m]))) >= 0.02
  )[10m:1m])
  {
    "status": "error",
    "error": "Internal error during building DataFusion plan: No field named
  inference_time_per_output_token_seconds_bucket.infr_svc_uid. Did you mean 'infr_svc_uid'?.",
    "errorType": "PlanQuery"
  }

Expected Behavior

histogram_quantile(...) >= threshold should return a boolean filtered time series, consistent with standard PromQL behavior
(as in Prometheus/VictoriaMetrics).

Actual Behavior

DataFusion fails to resolve the output field name of histogram_quantile() when used as input to a binary operator. The
internal field reference appears to retain the raw function call signature or the source table name as a prefix, rather than
resolving to the aggregated output.

Workaround

Using cumulative bucket ratios to approximate percentile threshold checks:

  # Equivalent to histogram_quantile(0.5, ...) >= 0.02
  sum by (infr_svc_uid) (rate(metric_bucket{le="0.02"}[1m]))
  / sum by (infr_svc_uid) (rate(metric_bucket{le="+Inf"}[1m])) < 0.5

Use Case

We are using vmalert to evaluate SLA alerting rules against GreptimeDB. The rules need to check if latency percentiles (p50,
p75, p90, p99) exceed defined thresholds, which requires binary comparisons on hist

What did you expect to see?

histogram_quantile(0.5, sum by (le, infr_svc_uid) (rate(inference_time_per_output_token_seconds_bucket[1m]))) >= 0.02
should return a boolean filtered time series (samples where p50 >= 0.02s), consistent with standard PromQL behavior in
Prometheus and VictoriaMetrics.

What did you see instead?

DataFusion internal error when applying any binary operator (>=, >, +, -) to histogram_quantile() output:

  1. Direct comparison returns:
  {"status":"error","error":"Internal error during building DataFusion plan: No field named
  \"sum(prom_rate(time_range,value,time,Int64(60000)))\".", "errorType":"PlanQuery"}
  1. When wrapped in a subquery via vmalert:
  {"status":"error","error":"Internal error during building DataFusion plan: No field named
  inference_time_per_output_token_seconds_bucket.infr_svc_uid. Did you mean 'infr_svc_uid'?.", "errorType":"PlanQuery"}

Note: histogram_quantile(...) alone evaluates correctly. The error only occurs when its result is used as input to a binary
operator.

What operating system did you use?

GreptimeDB v1.0.0-beta.4 deployed on Kubernetes (Linux amd64)

What version of GreptimeDB did you use?

v1.0.0-beta.4

Relevant log output and stack trace

vmalert log showing the full query and error response from GreptimeDB:

  2026-05-20T10:09:59.698Z error VictoriaMetrics/app/vmalert/rule/group.go:364
  group "inference-sla-critical": rule "InferenceSloViolationCritical": failed to execute:
  failed to execute query "(
    (count_over_time((
      sum by (infr_svc_uid) (rate(lm_gateway_request_errors_total{error_code=~\"5..|429\"}[1m]))
      / sum by (infr_svc_uid) (rate(lm_gateway_requests_total[1m])) >= 0.01
    )[10m:1m]) > bool 0)
    +
    (count_over_time((
      histogram_quantile(0.5, sum by (le, infr_svc_uid) (rate(inference_time_per_output_token_seconds_bucket[1m]))) >= 0.02
    )[10m:1m]) > bool 0)
  ) >= 2
  "
  Response body:
  {"status":"error","error":"Internal error during building DataFusion plan: No field named
  inference_time_per_output_token_seconds_bucket.infr_svc_uid. Did you mean 'infr_svc_uid'?.","errorType":"PlanQuery"}

  # Simplified reproduction via GreptimeDB Prometheus API (no subquery):
  # POST /v1/prometheus/api/v1/query
  # query=histogram_quantile(0.5, sum by (le, infr_svc_uid) (rate(inference_time_per_output_token_seconds_bucket[1m]))) >=
  0.02
  #
  # Response:
  # {"status":"error","error":"Internal error during building DataFusion plan: No field named
  \"sum(prom_rate(time_range,value,time,Int64(60000)))\".", "errorType":"PlanQuery"}

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-queryInvolves code in query pathC-bugCategory Bugs

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions