Skip to content

Commit c9e9275

Browse files
authored
Merge pull request #386 from gastlich/get-column-values
feat: Inspect columns
2 parents 5dc21d6 + 779789f commit c9e9275

10 files changed

Lines changed: 130 additions & 5 deletions

File tree

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Querying columns with SQL
2+
3+
The model `stg_columns` ([source](https://github.com/dbt-labs/dbt-project-evaluator/tree/main/models/staging/graph/stg_columns.sql)), created with the package, lists all the columns from all the dbt nodes (models, sources, tests, snapshots)
4+
5+
You can use this model to help with questions such as:
6+
7+
- Are there columns with the same name in different nodes?
8+
- Do any columns in the YAML configuration lack descriptions?
9+
- Do any columns share the same name but have different descriptions?
10+
- Are there columns with names that match a specific pattern (regex)?
11+
- Have any prohibited names been used for columns?
12+
13+
14+
## Defining additional tests that match your exact requirements
15+
16+
You can create a custom test against `{{ ref(stg_columns) }}` to test for your specific check! When running the package you'd need to make sure to also include children of the package's models by using the `package:dbt_project_evalutator+` selector.

integration_tests/models/dbt_project_evaluator_schema_tests/core.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ models:
99
tests:
1010
- unique
1111
- not_null
12-
12+
1313
- name: int_all_graph_resources
1414
description: "This table shows one record for each enabled resource in the graph and information about that resource."
1515
columns:
@@ -34,4 +34,4 @@ models:
3434
- name: unique_id
3535
tests:
3636
- unique
37-
- not_null
37+
- not_null

integration_tests/models/dbt_project_evaluator_schema_tests/graph.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,14 @@ models:
4949
- unique
5050
- not_null
5151

52+
- name: stg_columns
53+
description: "Staging model from the graph variable, one record per column resource."
54+
tests:
55+
- dbt_utils.unique_combination_of_columns:
56+
combination_of_columns:
57+
- node_unique_id
58+
- name
59+
5260
- name: stg_sources
5361
description: "Staging model from the graph variable, one record per source resource."
5462
columns:

macros/insert_resources_from_graph.sql

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
{% macro insert_resources_from_graph(relation, resource_type='nodes', relationships=False, batch_size=var('insert_batch_size') | int) %}
2-
{%- set values = get_resource_values(resource_type, relationships) -%}
1+
{% macro insert_resources_from_graph(relation, resource_type='nodes', relationships=False, columns=False, batch_size=var('insert_batch_size') | int) %}
2+
{%- set values = get_resource_values(resource_type, relationships, columns) -%}
33
{%- set values_length = values | length -%}
44
{%- set loop_count = (values_length / batch_size) | round(0, 'ceil') | int -%}
55

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
{%- macro get_column_values(node_type) -%}
2+
{{ return(adapter.dispatch('get_column_values', 'dbt_project_evaluator')(node_type)) }}
3+
{%- endmacro -%}
4+
5+
{%- macro default__get_column_values(node_type) -%}
6+
7+
{%- if execute -%}
8+
{%- if node_type == 'nodes' %}
9+
{% set nodes_list = graph.nodes.values() %}
10+
{%- elif node_type == 'sources' -%}
11+
{% set nodes_list = graph.sources.values() %}
12+
{%- else -%}
13+
{{ exceptions.raise_compiler_error("node_type needs to be either nodes or sources, got " ~ node_type) }}
14+
{% endif -%}
15+
16+
{%- set values = [] -%}
17+
18+
{%- for node in nodes_list -%}
19+
{%- for column in node.columns.values() -%}
20+
21+
{%- set values_line =
22+
[
23+
wrap_string_with_quotes(node.unique_id),
24+
wrap_string_with_quotes(dbt.escape_single_quotes(column.name)),
25+
wrap_string_with_quotes(dbt.escape_single_quotes(column.description)),
26+
wrap_string_with_quotes(dbt.escape_single_quotes(column.data_type)),
27+
wrap_string_with_quotes(dbt.escape_single_quotes(column.quote))
28+
]
29+
%}
30+
31+
{%- do values.append(values_line) -%}
32+
33+
{%- endfor -%}
34+
{%- endfor -%}
35+
{{ return(values) }}
36+
37+
{%- endif -%}
38+
39+
{%- endmacro -%}

macros/unpack/get_resource_values.sql

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
1-
{% macro get_resource_values(resource=None, relationships=None) %}
1+
{% macro get_resource_values(resource=None, relationships=None, columns=None) %}
22
{% if relationships %}
33
{{ return(adapter.dispatch('get_relationship_values', 'dbt_project_evaluator')(node_type=resource)) }}
4+
{% elif columns %}
5+
{{ return(adapter.dispatch('get_column_values', 'dbt_project_evaluator')(node_type=resource)) }}
46
{% elif resource == 'exposures' %}
57
{{ return(adapter.dispatch('get_exposure_values', 'dbt_project_evaluator')()) }}
68
{% elif resource == 'sources' %}

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ nav:
8181
- Configuring exceptions to the rules: customization/exceptions.md
8282
- Excluding packages and models/sources based on path: customization/excluding-packages-and-paths.md
8383
- Display issues in the logs: customization/issues-in-log.md
84+
- Querying columns: customization/querying-columns.md
8485
- Run in CI Check: ci-check.md
8586
- Querying the DAG: querying-the-dag.md
8687
- Contributing: contributing.md
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
{{
2+
config(
3+
materialized='table',
4+
post_hook="{{ insert_resources_from_graph(this, resource_type='nodes', columns=True) }}"
5+
)
6+
}}
7+
8+
{% if execute %}
9+
{{ check_model_is_table(model) }}
10+
{% endif %}
11+
/* Bigquery won't let us `where` without `from` so we use this workaround */
12+
with dummy_cte as (
13+
select 1 as foo
14+
)
15+
16+
select
17+
cast(null as {{ dbt.type_string() }}) as node_unique_id,
18+
cast(null as {{ dbt.type_string()}}) as name,
19+
cast(null as {{ dbt.type_string()}}) as description,
20+
cast(null as {{ dbt.type_string()}}) as data_type,
21+
cast(null as {{ dbt.type_string()}}) as quote
22+
23+
from dummy_cte
24+
where false
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
{{
2+
config(
3+
materialized='table',
4+
post_hook="{{ insert_resources_from_graph(this, resource_type='sources', columns=True) }}"
5+
)
6+
}}
7+
8+
{% if execute %}
9+
{{ check_model_is_table(model) }}
10+
{% endif %}
11+
/* Bigquery won't let us `where` without `from` so we use this workaround */
12+
with dummy_cte as (
13+
select 1 as foo
14+
)
15+
16+
select
17+
cast(null as {{ dbt.type_string() }}) as node_unique_id,
18+
cast(null as {{ dbt.type_string()}}) as name,
19+
cast(null as {{ dbt.type_string()}}) as description,
20+
cast(null as {{ dbt.type_string()}}) as data_type,
21+
cast(null as {{ dbt.type_string()}}) as quote
22+
23+
from dummy_cte
24+
where false
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
with
2+
3+
final as (
4+
5+
{{ dbt_utils.union_relations([
6+
ref('base_node_columns'),
7+
ref('base_source_columns')
8+
])}}
9+
)
10+
11+
select * from final

0 commit comments

Comments
 (0)