Make it easy to define/code/review a complex pipeline of transformation rules.
- The rules define a DAG/Directed-Acyclic-Graph, going from raw-data to complex measures. Intermediate nodes could also be used as measures.
- The DAG should be easily readable and modifiable but a human, not necessarily a developer.
- The DAG can express simple operations like
SUM
orPRODUCT
, and complex operations likeGROUPBY
.
Humans are generally happier when things goes faster. Adhoc
enables split-second queries over the underlying database. As the slow-part of a full AdhocQuery
is generally the
database pre-aggregation, we do not target absolute performance in Adhoc
. In other words, we prefer things to remains slower, as long as it enables
this project to remains simpler, given a query is generally slow due to the underlying database.
- SQLServer. We rely on many concepts from SQLServer to define our own abstractions.
- Apache Beam. Though Beam seems less flexible to access intermediate results as intermediate measures.
- MongoDB Aggregation Pipeline.
- DAX enables complex queries in Microsoft eco-system.
- SquashQL is an SQL query-engine for OLAP, with a strong emphasis on its typescript UI.
- Atoti PostProcessors
An AdhocQuery
is similar to a SELECT ... WHERE ... GROUP BY ...
SQL statement. It is defined by:
- a list of
groupBy
columns. - a set of
filter
clauses. - a list of measures, being either aggregated or transformed measures.
graph TB
sum -- haircut --> delta
sum -- haircut --> gamma
sum.FR -- country=France --> sum
ratio.FR --> sum.FR
ratio.FR --> sum
Adhoc is not a database, it is a query engine. It knows how to execute complex KPI queries, typically defined as complex graph of logics. The leaves of these graphes are simpler measures, to be provided by external databases.
Typical database are:
- CSV or Parquet files: Adhoc recommends querying local/remote CSV/Parquet files through DuckDb, with the JooqSqlDatabase.
- Any SQL database: you should rely on JooqSqlDatabase, possibly requiring a Professional or Enterprise JooQ license.
- ActivePivot/Atoti
Given databases may hold similar data but with different column names. A IAdhocDatabaseWrapper
enables coding once per database such a mapping.
A default IAdhocDatabaseWrapper
assumes IAdhocQuery
columns matches the IAdhocDatabaseWrapper
columns.
A measure can be:
- an aggregated measure (a column aggregated by an aggregation function)
- an transformed measure (one or multiple measures are mixed together, possibly with additional
filter
and/orgroupBys
).
Measures defines a Directed-Acyclic-Graph, where leaves are aggregated measures and nodes are transformed measures. The DAG is typically evaluated on a per-query basis, as the AdhocQuery groupBy
and filter
has to be combined with the own measures groupBys
and filters
.
Measures are evaluated for a coordinate, defined by the groupBy
and the filter
of its parent node. The root node have they groupBy
and filter
defined by the AdhocQuery.
- Combinator neither change the
groupBy
nor thefilter
. - Filtrator adds a
filter
, AND-ed with node ownfilter
. - Bucketor adds a
groupBy
, UNION-ed with node owngroupBy
.
Aggregations are used to reduce input data up to the requested (by groupBys
) granularity. Multiple aggregation functions may be applied over the same column.
See https://support.microsoft.com/en-us/office/aggregate-function-43b9278e-6aa7-4f17-92b6-e19993fa26df
On top of aggregated-measures, one can define transformation.
- Combinator: the simplest transformation evaluate a formula over underlying measures. (e.g.
sumMeasure=a+b
). - Filtrator: evaluate underlying measure with a coordinate when the filter is enforced. The node
filter
is AND-ed with themeasure
filter. Hence, if the query filterscountry=France
and the filtrator filterscountry=Germany
, then the result is empty. - Bucketor: evaluates the underlying measures with an additional groupBy, then aggregates up to the node granularity.
- Dispatchor: given an cell, it will contribute into multiple cells. Useful for
many-to-many
orrebucketing
.
In Analysis-Services, Many-to-Many is a feature enabling a fact (i.e. an input row) to contribute into multiple coordinate of a given column.
For instance, in a flatten GeographicalZone
column (e.g. having flattened a hierarchical Region->Country->City
), a single Paris
fact would contribute into Paris
, France
and Europe
.
This can be achieved in Adhoc with a Dispatchor
.
A full example is visible in TestManyToManyAdhocQuery
.
Sometimes, one wants to filter the visible members along some columns, without filtering the actual query. Typically, one may want
to query the ratio France/Europe
by filtering the France
country, without restricting Europe
to France
-only. For now, this can not be easily done.
The underlying issue is that one mah have a column filtering Country-with_firstLetterIsForG
. Assuming we have a measure returning currentCountry/Europe
where currentCountry
is the country on the Country
column, if we filter Country-with_firstLetterIsForG=true
in the query, should we show
France/(France+Germany)
or France/Europe
?
- We may introduce a special
groupBy
, where we would express we groupBycountry
but only showingCountry-with_firstLetterIsForG=true
- We may introduce a special
filter
, stating thatCountry-with_firstLetterIsForG=true
is aVisual
filter. It resomates with https://learn.microsoft.com/en-us/sql/mdx/visualtotals-mdx?view=sql-server-ver16