-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Is this your first time submitting a feature request?
- I have read the expectations for open source contributors
- I have searched the existing issues, and I could not find an existing issue for this feature
- I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion
Describe the feature
Improving ClickHouse's Materialized View Support in dbt
The dbt-clickhouse adapter https://github.com/ClickHouse/dbt-clickhouse allows the creation of Clickhouse-s materialized views using the dbt materialization materialized-view. Currently, both the materialized view and the target table are defined in the same SQL model, making it difficult to maintain. This is especially problematic for situations when several materialized views point to the same target table.
We want to decouple the definition of the materialized view and the target table where data will be stored:
┌──────────────┐ ┌──────────────────────┐ ┌──────────────┐
│ Source Table │ ───> │ Materialized View │ ───> │ Target Table │
└──────────────┘ └──────────────────────┘ └──────────────┘
- Both "Source Table" and "Target Table" can be defined with materialization
table. - To define the "Materialized View," we found some limitations as we'd like to add a new config in the materialized view to define which table this MV will write into.
Limitation 1: When executing ref() inside the config block, the current model is returned
If we execute it as:
{{ config(
materialized='materialized_view',
materialization_target_table=ref('events_aggregated')
) }}ref() inside the config block returns the current model instead of the referenced one. I'm not sure if this may be related to https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/context/providers.py#L678-L685
More context about this problem can be found here: https://discourse.getdbt.com/t/issue-passing-a-ref-table-name-to-a-custom-materialization/973/2
Limitation 2: In the dependency graph of the docs, the downstream dependency is drawn as an upstream one
So it looks like the MV will "read" from the table it's going to write into. It should be drawn in the same way as the first schema in this issue.
Limitation 3: Unit tests need to define data for the target table even when the model is not reading from it
Tables referenced with ref() need to have data defined for unit testing, but this MV implementation will not read data from the target table. In this case, we wouldn't need to force the addition of data in the unit test definition.
Proposal
Allow the use of ref() from the config block, allowing it to be marked as a downstream dependency. Something like:
{{ config(
materialized='materialized_view',
materialization_target_table=ref('events_aggregated', downstream=True)
) }}This way, we can:
- Use this new config to define the creation of the materialized view.
ref()correctly returns a reference to the target table, not a reference to the current file.- Documentation will correctly draw it as a downstream dependency.
- Unit tests will not require defining data for this model.
Describe alternatives you've considered
Currently, we have only found a way to work around the config definition limitation. The idea is to add a comment with the configuration and in the macro code parse the SQL to extract it:
-- materialization_target_table: {{ ref('events_aggregated') }}This allows defining this new config. The other two limitations are not overcome:
- The documentation draws this dependency as upstream, so users need to be aware that this dependency needs to be mentally managed in a different way.
- Unit tests require defining the target table, but this can be done just by defining it as empty.
Who will this benefit?
All users who use ClickHouse materialized views will be able to reason about, configure, and iterate independently on the tables and materialized views.
Are you interested in contributing this feature?
Yes
Anything else?
No response