Skip to content

Commit 40755ee

Browse files
Merge pull request #265 from dbt-labs/insert-materialization-for-large-projects
Insert materialization for large projects
2 parents 7203c67 + f760c0f commit 40755ee

24 files changed

Lines changed: 264 additions & 324 deletions

README.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,18 +23,18 @@ Currently, the following adapters are supported:
2323
### Cloning via dbt Package Hub
2424

2525
Check [dbt Hub](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest/) for the latest installation instructions, or [read the docs](https://docs.getdbt.com/docs/package-management) for more information on installing packages.
26-
### Additional setup for Databricks/Spark
26+
### Additional setup for Databricks/Spark/DuckDB/Redshift
2727

2828
In your `dbt_project.yml`, add the following config:
2929
```yml
3030
# dbt_project.yml
3131

3232
dispatch:
33-
- macro_namespace: dbt_utils
34-
search_order: ['dbt_project_evaluator', 'spark_utils', 'dbt_utils']
33+
- macro_namespace: dbt
34+
search_order: ['dbt_project_evaluator', 'dbt']
3535
```
3636
37-
This is required because the project currently provides limited support for arrays macros for Databricks/Spark which is not part of `spark_utils` yet.
37+
This is required because the project currently overrides a small number of dbt core macros in order to ensure the project can run across the listed adapters. The overridden macros are in the [cross_db_shim directory](macros/cross_db_shim/).
3838
3939
### How It Works
4040
@@ -1055,14 +1055,17 @@ vars:
10551055
| variable | description | default |
10561056
| ----------- | ----------- | ----------- |
10571057
| `chained_views_threshold` | threshold for unacceptable length of chain of views for `fct_chained_views_dependencies` | 4 |
1058+
| `insert_batch_size` | number of records inserted per batch when unpacking the graph into models | 10000 |
10581059

10591060
```yml
10601061
# dbt_project.yml
1061-
# set your chained views threshold to 8 instead of 4
10621062
10631063
vars:
10641064
dbt_project_evaluator:
1065+
# set your chained views threshold to 8 instead of 4
10651066
chained_views_threshold: 8
1067+
# update the number of records inserted from the graph from 10,000 to 500 to reduce query size
1068+
insert_batch_size: 500
10661069
```
10671070
</details>
10681071

dbt_project.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ vars:
7878

7979
# -- Performance variables --
8080
chained_views_threshold: 5
81+
insert_batch_size: "{{ 500 if target.type == 'bigquery' else 10000 }}"
8182

8283
# -- Warehouse specific variables --
8384
max_depth_dag: 9

integration_tests/dbt_project.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,4 +63,5 @@ vars:
6363
# dummy variable used for testing fct_hard_coded_references
6464
my_table_reference: 'grace_table'
6565
new_model_type_folder_name: 'my_new_models'
66-
new_model_type_prefixes: 'nwmdl_'
66+
new_model_type_prefixes: 'nwmdl_'
67+
insert_batch_size: 100

integration_tests_2/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@ Currently, this project is used to test the package behavior when:
66
- there is no exposure
77
- there is no metric
88
- people don't override the default seed for `dbt_project_evaluator_exceptions`
9+
- people don't override the default value of `insert_batch_size`
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{%- macro redshift__type_string() -%}
2+
{{ "VARCHAR(600)" }}
3+
{%- endmacro %}
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{% macro insert_resources_from_graph(relation, resource_type='nodes', relationships=False, batch_size=var('insert_batch_size') | int) %}
2+
{%- set values = get_resource_values(resource_type, relationships) -%}
3+
{%- set values_length = values | length -%}
4+
{%- set loop_count = (values_length / batch_size) | round(0, 'ceil') | int -%}
5+
6+
{%- for loop_number in range(loop_count) -%}
7+
{%- set lower_bound = loop.index0 * batch_size -%}
8+
{%- set upper_bound = loop.index * batch_size -%}
9+
{%- set values_subset = values[lower_bound : upper_bound] %}
10+
{%- set values_list_of_strings = [] -%}
11+
{%- for indiv_values in values_subset %}
12+
{%- do values_list_of_strings.append( indiv_values | join(", \n")) -%}
13+
{%- endfor -%}
14+
{%- set values_string = '(' ~ values_list_of_strings | join("), \n\n(") ~ ')' %}
15+
{%- set insert_statement = "insert into " ~ relation ~ " values \n" ~ values_string ~ ";"%}
16+
{% call statement('insert') -%}
17+
{{ insert_statement }}
18+
{%- endcall %}
19+
{% endfor %}
20+
21+
{% endmacro %}

macros/select_from_values.sql

Lines changed: 0 additions & 154 deletions
This file was deleted.
Lines changed: 4 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
{%- macro get_exposures() -%}
2-
{{ return(adapter.dispatch('get_exposures', 'dbt_project_evaluator')()) }}
1+
{%- macro get_exposure_values() -%}
2+
{{ return(adapter.dispatch('get_exposure_values', 'dbt_project_evaluator')()) }}
33
{%- endmacro -%}
44

5-
{%- macro default__get_exposures() -%}
5+
{%- macro default__get_exposure_values() -%}
66

77
{%- if execute -%}
88

@@ -33,24 +33,6 @@
3333
{%- endfor -%}
3434
{%- endif -%}
3535

36-
{{ return(
37-
dbt_project_evaluator.select_from_values(
38-
values = values,
39-
columns = [
40-
'unique_id',
41-
'name',
42-
'resource_type',
43-
'file_path',
44-
('is_described', 'boolean'),
45-
'exposure_type',
46-
'maturity',
47-
'package_name',
48-
'url',
49-
'owner_name',
50-
'owner_email',
51-
'meta'
52-
]
53-
)
54-
) }}
36+
{{ return(values) }}
5537

5638
{%- endmacro -%}
Lines changed: 4 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
{%- macro get_metrics() -%}
2-
{{ return(adapter.dispatch('get_metrics', 'dbt_project_evaluator')()) }}
1+
{%- macro get_metric_values() -%}
2+
{{ return(adapter.dispatch('get_metric_values', 'dbt_project_evaluator')()) }}
33
{%- endmacro -%}
44

5-
{%- macro default__get_metrics() -%}
5+
{%- macro default__get_metric_values() -%}
66

77
{%- if execute -%}
88
{%- set nodes_list = graph.metrics.values() -%}
@@ -45,26 +45,6 @@
4545
{%- endfor -%}
4646
{%- endif -%}
4747

48-
{{ return(
49-
dbt_project_evaluator.select_from_values(
50-
values = values,
51-
columns = [
52-
'unique_id',
53-
'name',
54-
'resource_type',
55-
'file_path',
56-
('is_described', 'boolean'),
57-
'metric_type',
58-
'model',
59-
'label',
60-
'sql',
61-
'timestamp',
62-
'package_name',
63-
'dimensions',
64-
'filters',
65-
'meta'
66-
]
67-
)
68-
) }}
48+
{{ return(values) }}
6949

7050
{%- endmacro -%}
Lines changed: 4 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
{%- macro get_nodes() -%}
2-
{{ return(adapter.dispatch('get_nodes', 'dbt_project_evaluator')()) }}
1+
{%- macro get_node_values() -%}
2+
{{ return(adapter.dispatch('get_node_values', 'dbt_project_evaluator')()) }}
33
{%- endmacro -%}
44

5-
{%- macro default__get_nodes() -%}
5+
{%- macro default__get_node_values() -%}
66

77
{%- if execute -%}
88
{%- set nodes_list = graph.nodes.values() -%}
@@ -39,29 +39,6 @@
3939
{%- endfor -%}
4040
{%- endif -%}
4141

42-
{{ return(
43-
dbt_project_evaluator.select_from_values(
44-
values = values,
45-
columns = [
46-
'unique_id',
47-
'name',
48-
'resource_type',
49-
'file_path',
50-
('is_enabled', 'boolean'),
51-
'materialized',
52-
'on_schema_change',
53-
'database',
54-
'schema',
55-
'package_name',
56-
'alias',
57-
('is_described', 'boolean'),
58-
'column_name',
59-
'meta',
60-
'hard_coded_references',
61-
'macro_dependencies',
62-
('is_generic_test', 'boolean')
63-
]
64-
)
65-
) }}
42+
{{ return(values) }}
6643

6744
{%- endmacro -%}

0 commit comments

Comments
 (0)