Skip to content

[Feature] ref() improvements: support in config block and allow referencing downstream dependencies. #12319

@koletzilla

Description

@koletzilla

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Improving ClickHouse's Materialized View Support in dbt

The dbt-clickhouse adapter https://github.com/ClickHouse/dbt-clickhouse allows the creation of Clickhouse-s materialized views using the dbt materialization materialized-view. Currently, both the materialized view and the target table are defined in the same SQL model, making it difficult to maintain. This is especially problematic for situations when several materialized views point to the same target table.

We want to decouple the definition of the materialized view and the target table where data will be stored:

┌──────────────┐       ┌──────────────────────┐       ┌──────────────┐
│ Source Table │  ───> │ Materialized View    │  ───> │ Target Table │
└──────────────┘       └──────────────────────┘       └──────────────┘
  • Both "Source Table" and "Target Table" can be defined with materialization table.
  • To define the "Materialized View," we found some limitations as we'd like to add a new config in the materialized view to define which table this MV will write into.

Limitation 1: When executing ref() inside the config block, the current model is returned

If we execute it as:

{{ config(
  materialized='materialized_view',
  materialization_target_table=ref('events_aggregated')
) }}

ref() inside the config block returns the current model instead of the referenced one. I'm not sure if this may be related to https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/context/providers.py#L678-L685

More context about this problem can be found here: https://discourse.getdbt.com/t/issue-passing-a-ref-table-name-to-a-custom-materialization/973/2

Limitation 2: In the dependency graph of the docs, the downstream dependency is drawn as an upstream one

So it looks like the MV will "read" from the table it's going to write into. It should be drawn in the same way as the first schema in this issue.

Limitation 3: Unit tests need to define data for the target table even when the model is not reading from it

Tables referenced with ref() need to have data defined for unit testing, but this MV implementation will not read data from the target table. In this case, we wouldn't need to force the addition of data in the unit test definition.

Proposal

Allow the use of ref() from the config block, allowing it to be marked as a downstream dependency. Something like:

{{ config(
  materialized='materialized_view',
  materialization_target_table=ref('events_aggregated', downstream=True)
) }}

This way, we can:

  • Use this new config to define the creation of the materialized view.
  • ref() correctly returns a reference to the target table, not a reference to the current file.
  • Documentation will correctly draw it as a downstream dependency.
  • Unit tests will not require defining data for this model.

Describe alternatives you've considered

Currently, we have only found a way to work around the config definition limitation. The idea is to add a comment with the configuration and in the macro code parse the SQL to extract it:

-- materialization_target_table: {{ ref('events_aggregated') }}

This allows defining this new config. The other two limitations are not overcome:

  • The documentation draws this dependency as upstream, so users need to be aware that this dependency needs to be mentally managed in a different way.
  • Unit tests require defining the target table, but this can be done just by defining it as empty.

Who will this benefit?

All users who use ClickHouse materialized views will be able to reason about, configure, and iterate independently on the tables and materialized views.

Are you interested in contributing this feature?

Yes

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions