Error bounds / probabilities / skewness as first-class Druid query results

Describing [Online Aggregation](https://github.com/apache/incubator-druid/issues/7087), I suggested that when Broker sends partial results back to the client it also sends a flag indicating that the partial aggregation results may be skewed. It may also send estimated error / confidence intervals of the partial aggregation values, if it is able to compute them for the given aggregation function, and if the user opts to receive such data.

I think this idea shouldn't be confined to partial query results during online aggregation and could equally apply to "final" query results (equivalent to "offline" query results).

Some of the sources of inconsistencies / error / variance:
 - Limitations of the distributed query execution: see an issue regarding [TopN Aliasing](https://github.com/apache/incubator-druid/issues/1134) (where @drcrallen [gives a direct example](https://github.com/apache/incubator-druid/issues/1134#issuecomment-74913074) of variance between topN results from different data nodes. See also a related [join query discussion](https://github.com/apache/incubator-druid/issues/4040).
 - Time trends for single-valued query types such as topN and groupBy: relative results of for different dimension values (grouping keys) may have a time trend that is averaged out by the final aggregation and thus invisible to the user.
 - Significant variance between different partitions within the same time interval might mean that there is simply not enough data to draw reliable conclusions from the final results. In some contexts this is OK (usually when making a topN or count query we are really interested in absolute values, for example, `count(log_lines) where error=true`), but in other cases, namely when we are interested in proportion, relative values, and trends, we should at least make users aware of the fact that results may include significant error.
 - Natural probabilistic nature of many Druid aggregators such as quantiles, sketches, HLL, etc. including those that back up classic SQL query types under the covers. See, for example, #6099 and the discussion on [replacing the default implementation behind DISTINCT COUNT from one probabilistic structure to another](https://github.com/apache/incubator-druid/issues/6814).
 - Something else?

As well as with [Online Aggregation](https://github.com/apache/incubator-druid/issues/7087), work should be done on both the backend (Druid itself) and frontend side of UIs querying into Druid to support this and bring value to users.

In terms of [antifragility](https://en.wikipedia.org/wiki/Antifragility), the current Druid's error-oblivious approach to query results may be classified as fragile. The approach that makes errors first-class query results might be classified as resilient or perhaps even antifragile because it might help users to learn something new about their data during abrupt events.

FYI @gianm @mistercrunch @vogievetsky @julianhyde @leerho @weijietong

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error bounds / probabilities / skewness as first-class Druid query results #7160

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error bounds / probabilities / skewness as first-class Druid query results #7160

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions