Skip to content
Merged
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,16 @@ Currently, the following adapters are supported:
- AWS Athena (tested manually)
- Greenplum (tested manually)
- ClickHouse (tested manually)
- Microsoft Fabric Data Warehouse (tested manually)
- Microsoft Fabric Spark (tested manually)

## Using This Package

### Cloning via dbt Package Hub

Check [dbt Hub](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest/) for the latest installation instructions, or [read the docs](https://docs.getdbt.com/docs/package-management) for more information on installing packages.

### Additional setup for Databricks/Spark/DuckDB/Redshift/ClickHouse
### Additional setup for Databricks/Spark/DuckDB/Redshift/ClickHouse/Fabric

In your `dbt_project.yml`, add the following config:

Expand Down
20 changes: 10 additions & 10 deletions dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,17 +26,17 @@ dispatch:

models:
dbt_project_evaluator:
+materialized: "{{ 'table' if target.type in ['duckdb'] else 'view' }}"
+materialized: "{{ 'table' if target.type in ['duckdb', 'fabric'] else 'view' }}"
marts:
core:
int_all_graph_resources:
+materialized: table
int_direct_relationships:
# required for BigQuery and Redshift for performance/memory reasons
+materialized: "{{ 'table' if target.type in ['bigquery', 'redshift', 'databricks'] else 'view' }}"
# required for BigQuery, Redshift, Databricks, and Fabric for performance/memory reasons
+materialized: "{{ 'table' if target.type in ['bigquery', 'redshift', 'databricks', 'fabric'] else 'view' }}"
int_all_dag_relationships:
# required for BigQuery, Redshift, and Databricks for performance/memory reasons
+materialized: "{{ 'table' if target.type in ['bigquery', 'redshift', 'databricks', 'clickhouse'] else 'view' }}"
# required for BigQuery, Redshift, Databricks, Clickhouse, and Fabric for performance/memory reasons
+materialized: "{{ 'table' if target.type in ['bigquery', 'redshift', 'databricks', 'clickhouse', 'fabric'] else 'view' }}"
dag:
+materialized: table
staging:
Expand All @@ -45,11 +45,11 @@ models:
+materialized: table
variables:
stg_naming_convention_folders:
# required for Redshift because listagg runs only on tables
+materialized: "{{ 'table' if target.type == 'redshift' else 'view' }}"
# required for Redshift and Fabric because listagg runs only on tables
+materialized: "{{ 'table' if target.type in ['redshift', 'fabric'] else 'view' }}"
stg_naming_convention_prefixes:
# required for Redshift because listagg runs only on tables
+materialized: "{{ 'table' if target.type == 'redshift' else 'view' }}"
# required for Redshift and Fabric because listagg runs only on tables
+materialized: "{{ 'table' if target.type in ['redshift', 'fabric'] else 'view' }}"


vars:
Expand Down Expand Up @@ -89,7 +89,7 @@ vars:

# -- Execution variables --
insert_batch_size: "{{ 500 if target.type in ['athena', 'bigquery'] else 10000 }}"
max_depth_dag: "{{ 9 if target.type in ['bigquery', 'spark', 'databricks'] else 4 if target.type in ['athena', 'trino', 'clickhouse'] else -1 }}"
max_depth_dag: "{{ 9 if target.type in ['bigquery', 'spark', 'databricks', 'fabric'] else 4 if target.type in ['athena', 'trino', 'clickhouse'] else -1 }}"

# -- Code complexity variables --
comment_chars: ["--"]
Expand Down
4 changes: 2 additions & 2 deletions docs/customization/overriding-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,14 +103,14 @@ vars:

| variable | description | default |
| ----------- | ----------- | ----------- |
| `max_depth_dag` | limits the maximum distance between nodes calculated in `int_all_dag_relationships` | 9 for bigquery and spark, -1 for other adatpters |
| `max_depth_dag` | limits the maximum distance between nodes calculated in `int_all_dag_relationships` | 9 for bigquery, spark, and fabric, -1 for other adapters |
| `insert_batch_size` | number of records inserted per batch when unpacking the graph into models | 10000 |

**Note on max_depth_dag**

The default behavior for limiting the relationships calculated in the `int_all_dag_relationships` model differs depending on your adapter.

- For Bigquery & Spark/Databricks the maximum distance between two nodes in your DAG, calculated in `int_all_dag_relationships`, is set by the `max_depth_dag` variable, which is defaulted to 9. So by default, `int_all_dag_relationships` contains a row for every path less than or equal to 9 nodes in length between two nodes in your DAG. This is because these adapters do not currently support recursive SQL, and queries often fail on more than 9 recursive joins.
- For BigQuery, Spark/Databricks, and Microsoft Fabric Data Warehouse the maximum distance between two nodes in your DAG, calculated in `int_all_dag_relationships`, is set by the `max_depth_dag` variable, which is defaulted to 9. So by default, `int_all_dag_relationships` contains a row for every path less than or equal to 9 nodes in length between two nodes in your DAG. This is because these adapters do not currently support recursive SQL, and queries often fail on more than 9 recursive joins.
- For all other adapters `int_all_dag_relationships` by default contains a row for every single path between two nodes in your DAG. If you experience long runtimes for the `int_all_dag_relationships` model, you may consider limiting the length of your generated DAG paths. To do this, set `max_depth_dag: {{ whatever limit you want to enforce }}`. The value of `max_depth_dag` must be greater than 2 for all DAG tests to work, and greater than `chained_views_threshold` to ensure your performance tests to work. By default, the value of this variable for these adapters is -1, which the package interprets as "no limit".

```yaml title="dbt_project.yml"
Expand Down
8 changes: 5 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,16 @@ Currently, the following adapters are supported:
- AWS Athena (tested manually)
- Greenplum (tested manually)
- ClickHouse (tested manually)
- Microsoft Fabric Data Warehouse (tested manually)
- Microsoft Fabric Spark (tested manually)

## Using This Package

### Cloning via dbt Package Hub

Check [dbt Hub](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest/) for the latest installation instructions, or [read the docs](https://docs.getdbt.com/docs/package-management) for more information on installing packages.

### Additional setup for Databricks/Spark/DuckDB/Redshift
### Additional setup for Databricks/Spark/DuckDB/Redshift/Fabric

In your `dbt_project.yml`, add the following config:

Expand Down Expand Up @@ -64,8 +66,8 @@ Each test warning indicates the presence of a type of misalignment. To troublesh

## Limitations

### BigQuery and Databricks
### BigQuery, Databricks, and Microsoft Fabric Data Warehouse

BigQuery current support for recursive CTEs is limited and Databricks SQL doesn't support recursive CTEs.
BigQuery has limited support for recursive CTEs, while Databricks SQL and Microsoft Fabric Data Warehouse do not support them.

For those Data Warehouses, the model `int_all_dag_relationships` needs to be created by looping CTEs instead. The number of loops is configured with `max_depth_dag` and defaulted to 9. This means that dependencies between models of more than 9 levels of separation won't show in the model `int_all_dag_relationships` but tests on the DAG will still be correct. With a number of loops higher than 9 BigQuery sometimes raises an error saying the query is too complex.
8 changes: 8 additions & 0 deletions macros/cross_db_shim/bool_literal.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{# Convert a Python boolean to a SQL boolean literal appropriate for the target adapter #}
{% macro bool_literal(value) %}
{{ return(adapter.dispatch('bool_literal', 'dbt_project_evaluator')(value)) }}
{% endmacro %}

{% macro default__bool_literal(value) %}{{ value | trim }}{% endmacro %}

{% macro fabric__bool_literal(value) %}{% if value %}1{% else %}0{% endif %}{% endmacro %}
7 changes: 7 additions & 0 deletions macros/cross_db_shim/escape_single_quotes.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{% macro spark__escape_single_quotes(expression) -%}
{{ expression | replace("'","\\'") }}
{%- endmacro %}

{% macro fabric__escape_single_quotes(expression) -%}
{{ expression | replace("'","''") }}
{%- endmacro %}
7 changes: 7 additions & 0 deletions macros/cross_db_shim/quote_identifier.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{% macro quote_identifier(name) %}
{{ return(adapter.dispatch('quote_identifier', 'dbt_project_evaluator')(name)) }}
{% endmacro %}

{% macro default__quote_identifier(name) %}{{ name }}{% endmacro %}

{% macro fabric__quote_identifier(name) %}[{{ name }}]{% endmacro %}
3 changes: 0 additions & 3 deletions macros/cross_db_shim/spark_shims.sql

This file was deleted.

4 changes: 4 additions & 0 deletions macros/cross_db_shim/type_string.sql
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,7 @@
{%- macro redshift__type_string_dpe() -%}
{{ return(api.Column.string_type(600)) }}
{%- endmacro -%}

{%- macro fabric__type_string_dpe() -%}
{{ return("varchar(8000)") }}
{%- endmacro -%}
17 changes: 16 additions & 1 deletion macros/get_directory_pattern.sql
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@
{% endmacro %}

{% macro get_dbtreplace_directory_pattern() %}
{{ return(adapter.dispatch('get_dbtreplace_directory_pattern', 'dbt_project_evaluator')()) }}
{% endmacro %}

{% macro default__get_dbtreplace_directory_pattern() %}
{% if execute %}
{%- set on_mac_or_linux = dbt_project_evaluator.is_os_mac_or_linux() -%}
{%- if on_mac_or_linux -%}
Expand All @@ -31,4 +35,15 @@
{{ dbt.replace("file_path", "regexp_replace(file_path,'.*\\\\\\\\','')", "''") }}
{% endif %}
{% endif %}
{% endmacro %}
{% endmacro %}

{% macro fabric__get_dbtreplace_directory_pattern() %}
{% if execute %}
{%- set on_mac_or_linux = dbt_project_evaluator.is_os_mac_or_linux() -%}
{%- if on_mac_or_linux -%}
left(file_path, len(file_path) - charindex('/', reverse(file_path)))
{%- else -%}
left(file_path, len(file_path) - charindex('\', reverse(file_path)))
{%- endif -%}
{% endif %}
{% endmacro %}
4 changes: 4 additions & 0 deletions macros/is_not_empty_string.sql
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,8 @@
{{ false }}
{% endif %}

{% endmacro %}

{% macro fabric__is_not_empty_string(str) %}
{% if str %}1{% else %}0{% endif %}
{% endmacro %}
119 changes: 119 additions & 0 deletions macros/recursive_dag.sql
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,125 @@ with direct_relationships as (
{% endmacro %}


{% macro fabric__recursive_dag() %}

{% set max_depth = var('max_depth_dag') | int %}
{% if max_depth < 2 or max_depth < var('chained_views_threshold') | int %}
{% do exceptions.raise_compiler_error(
'Variable max_depth_dag must be at least 2 and must be greater or equal to than chained_views_threshold.'
) %}
{% endif %}

with direct_relationships as (
select
*
from {{ ref('int_direct_relationships') }}
where resource_type <> 'test'
)

, get_distinct as (
select distinct
resource_id as parent_id,
resource_id as child_id,
resource_name,
materialized as child_materialized,
is_public as child_is_public,
access as child_access,
is_excluded as child_is_excluded

from direct_relationships
)

, cte_0 as (
select
parent_id,
child_id,
child_materialized,
child_is_public,
child_access,
child_is_excluded,
0 as distance,
cast({{ dbt.array_construct(['resource_name']) }} as varchar(max)) as path,
cast(null as {{ dbt.type_boolean() }}) as is_dependent_on_chain_of_views
from get_distinct
)

{% for i in range(1, max_depth) %}
{% set prev_cte_path %}cte_{{ i - 1 }}.path{% endset %}
, cte_{{ i }} as (
select
cte_{{ i - 1 }}.parent_id as parent_id,
direct_relationships.resource_id as child_id,
direct_relationships.materialized as child_materialized,
direct_relationships.is_public as child_is_public,
direct_relationships.access as child_access,
direct_relationships.is_excluded as child_is_excluded,
cte_{{ i - 1 }}.distance+1 as distance,
cast({{ dbt.array_append(prev_cte_path, 'direct_relationships.resource_name') }} as varchar(max)) as path,
case
when
cte_{{ i - 1 }}.child_materialized in ('view', 'ephemeral')
and coalesce(cte_{{ i - 1 }}.is_dependent_on_chain_of_views, cast(1 as bit)) = cast(1 as bit)
then cast(1 as bit)
else cast(0 as bit)
end as is_dependent_on_chain_of_views

from direct_relationships
inner join cte_{{ i - 1 }}
on cte_{{ i - 1 }}.child_id = direct_relationships.direct_parent_id
)
{% endfor %}

, all_relationships_unioned as (
{% for i in range(max_depth) %}
select * from cte_{{ i }}
{% if not loop.last %}union all{% endif %}
{% endfor %}
)

, resource_info as (
select * from {{ ref('int_all_graph_resources') }}
)

, all_relationships as (
select
parent.resource_id as parent_id,
parent.resource_name as parent,
parent.resource_type as parent_resource_type,
parent.model_type as parent_model_type,
parent.materialized as parent_materialized,
parent.is_public as parent_is_public,
parent.access as parent_access,
parent.source_name as parent_source_name,
parent.file_path as parent_file_path,
parent.directory_path as parent_directory_path,
parent.file_name as parent_file_name,
parent.is_excluded as parent_is_excluded,
child.resource_id as child_id,
child.resource_name as child,
child.resource_type as child_resource_type,
child.model_type as child_model_type,
child.materialized as child_materialized,
child.is_public as child_is_public,
child.access as child_access,
child.source_name as child_source_name,
child.file_path as child_file_path,
child.directory_path as child_directory_path,
child.file_name as child_file_name,
child.is_excluded as child_is_excluded,
cast(all_relationships_unioned.distance as {{ dbt.type_int() }}) as distance,
all_relationships_unioned.path,
case when all_relationships_unioned.is_dependent_on_chain_of_views = cast(1 as bit) then cast(1 as bit) else cast(0 as bit) end as is_dependent_on_chain_of_views
from all_relationships_unioned
left join resource_info as parent
on all_relationships_unioned.parent_id = parent.resource_id
left join resource_info as child
on all_relationships_unioned.child_id = child.resource_id
)

{% endmacro %}


{% macro clickhouse__recursive_dag() %}
{{ return(bigquery__recursive_dag()) }}
{% endmacro %}
Expand Down
2 changes: 1 addition & 1 deletion macros/unpack/get_column_values.sql
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
wrap_string_with_quotes(dbt.escape_single_quotes(column.description | replace("\\","\\\\"))),
wrap_string_with_quotes(dbt.escape_single_quotes(column.data_type)),
wrap_string_with_quotes(dbt.escape_single_quotes(tojson(column.constraints))),
column.constraints | selectattr('type', 'equalto', 'not_null') | list | length > 0,
"cast(" ~ dbt_project_evaluator.bool_literal(column.constraints | selectattr('type', 'equalto', 'not_null') | list | length > 0) | trim ~ " as " ~ dbt.type_boolean() ~ ")",
column.constraints | length,
wrap_string_with_quotes(dbt.escape_single_quotes(column.quote))
]
Expand Down
6 changes: 3 additions & 3 deletions macros/unpack/get_node_values.sql
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,15 @@
wrap_string_with_quotes(node.name),
wrap_string_with_quotes(node.resource_type),
wrap_string_with_quotes(node.original_file_path | replace("\\","\\\\")),
"cast(" ~ node.config.enabled | trim ~ " as " ~ dbt.type_boolean() ~ ")",
"cast(" ~ dbt_project_evaluator.bool_literal(node.config.enabled) | trim ~ " as " ~ dbt.type_boolean() ~ ")",
wrap_string_with_quotes(node.config.materialized),
wrap_string_with_quotes(node.config.on_schema_change),
wrap_string_with_quotes(node.group),
wrap_string_with_quotes(node.access),
wrap_string_with_quotes(node.latest_version),
wrap_string_with_quotes(node.version),
wrap_string_with_quotes(node.deprecation_date),
"cast(" ~ contract | trim ~ " as " ~ dbt.type_boolean() ~ ")",
"cast(" ~ dbt_project_evaluator.bool_literal(contract) | trim ~ " as " ~ dbt.type_boolean() ~ ")",
node.columns.values() | list | length,
node.columns.values() | list | selectattr('description') | list | length,
wrap_string_with_quotes(node.database),
Expand All @@ -46,7 +46,7 @@
sql_complexity,
wrap_string_with_quotes(node.get('depends_on',{}).get('macros',[]) | tojson),
"cast(" ~ dbt_project_evaluator.is_not_empty_string(node.test_metadata) | trim ~ " as " ~ dbt.type_boolean() ~ ")",
"cast(" ~ exclude_node ~ " as " ~ dbt.type_boolean() ~ ")",
"cast(" ~ dbt_project_evaluator.bool_literal(exclude_node) | trim ~ " as " ~ dbt.type_boolean() ~ ")",
]
%}

Expand Down
8 changes: 4 additions & 4 deletions macros/unpack/get_relationship_values.sql
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@

{%- if node.get('depends_on',{}).get('nodes',[]) |length == 0 -%}

{%- set values_line =
{%- set values_line =
[
"cast('" ~ node.unique_id ~ "' as " ~ dbt_project_evaluator.type_string_dpe() ~ ")",
"cast(NULL as " ~ dbt_project_evaluator.type_string_dpe() ~ ")",
"FALSE",
]
"cast(" ~ dbt_project_evaluator.bool_literal(false) | trim ~ " as " ~ dbt.type_boolean() ~ ")",
]
%}

{%- do values.append(values_line) -%}
Expand All @@ -42,7 +42,7 @@
[
"cast('" ~ node.unique_id ~ "' as " ~ dbt_project_evaluator.type_string_dpe() ~ ")",
"cast('" ~ parent ~ "' as " ~ dbt_project_evaluator.type_string_dpe() ~ ")",
"" ~ is_primary ~ "" if node.unique_id.split('.')[0] == 'test' else "FALSE"
"cast(" ~ dbt_project_evaluator.bool_literal(is_primary if node.unique_id.split('.')[0] == 'test' else false) | trim ~ " as " ~ dbt.type_boolean() ~ ")"
]
%}

Expand Down
Loading
Loading