Skip to content

Commit 8c7d958

Browse files
authored
Merge pull request #441 from dbt-labs/feature/calculate-sql-complexity
2 parents e6bbe8f + 2f63142 commit 8c7d958

10 files changed

Lines changed: 100 additions & 3 deletions

dbt_project.yml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,3 +90,23 @@ vars:
9090

9191
# -- Code complexity variables --
9292
comment_chars: ["--"]
93+
token_costs: {
94+
"and": 0.1,
95+
"or": 0.1,
96+
"when": 0.5,
97+
"coalesce": 1,
98+
"distinct": 1,
99+
"greatest": 1,
100+
"least": 1,
101+
"group": 1,
102+
"join": 1,
103+
"order": 1,
104+
"select": 1,
105+
"where": 1,
106+
"having": 2,
107+
"flatten": 3,
108+
"unnest": 3,
109+
"pivot": 3,
110+
"partition by": 3,
111+
"qualify": 3,
112+
}

docs/customization/overriding-variables.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,13 @@ vars:
9292
chained_views_threshold: 8
9393
```
9494

95+
## SQL code analysis
96+
97+
| variable | description | default |
98+
| ----------- | ----------- | ----------- |
99+
| `comment_chars` | a list of strings used for inline comments | `["--"]` |
100+
| `token_costs` | a dictionary of SQL tokens (words) and associated complexity weight, <br>used to estimate models complexity | see in the `dbt_project.yml` file of the package |
101+
95102
## Execution
96103

97104
| variable | description | default |

docs/customization/querying-columns.md renamed to docs/customization/querying-columns-names-and-descriptions.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
1-
# Querying columns with SQL
1+
# Querying columns names and descriptions with SQL
22

3-
The model `stg_columns` ([source](https://github.com/dbt-labs/dbt-project-evaluator/tree/main/models/staging/graph/stg_columns.sql)), created with the package, lists all the columns from all the dbt nodes (models, sources, tests, snapshots)
3+
The model `stg_columns` ([source](https://github.com/dbt-labs/dbt-project-evaluator/tree/main/models/staging/graph/stg_columns.sql)), created with the package, lists all the columns configured in all the dbt nodes (models, sources, tests, snapshots).
4+
5+
It will not list the columns of the models that have not explicitly been added to the YAML files.
46

57
You can use this model to help with questions such as:
68

docs/querying-the-dag.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ Building additional models and snapshots on top of this model could allow:
1717

1818
## Getting insights on potential refactoring work
1919

20+
- identifying models with a lof of lines of code
21+
- identifying the models with the highest level of complexity leveraging the column `sql_complexity` from the table `int_all_graph_resources`, based on the weights defined in the `token_costs` variable
2022
- looking at the longest "chains" of models in a project
2123
- reviewing models with many/few direct dependents
2224
- identifying potential bottlenecks

macros/calculate_number_lines.sql

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{% macro calculate_number_lines(node) %}
2+
{{ return(adapter.dispatch('calculate_number_lines', 'dbt_project_evaluator')(node)) }}
3+
{% endmacro %}
4+
5+
{% macro default__calculate_number_lines(node) %}
6+
7+
{% if node.resource_type == 'model' %}
8+
9+
{% if execute %}
10+
{%- set model_raw_sql = node.raw_sql or node.raw_code -%}
11+
{%- else -%}
12+
{%- set model_raw_sql = '' -%}
13+
{%- endif -%}
14+
15+
{{ return(model_raw_sql.count("\n")) + 1 }}
16+
17+
{% endif %}
18+
19+
{{ return(0) }}
20+
21+
{% endmacro %}
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
{% macro calculate_sql_complexity(node) %}
2+
{{ return(adapter.dispatch('calculate_sql_complexity', 'dbt_project_evaluator')(node)) }}
3+
{% endmacro %}
4+
5+
{% macro default__calculate_sql_complexity(node) %}
6+
7+
{% if node.resource_type == 'model' and node.language == 'sql' %}
8+
9+
{% if execute %}
10+
{%- set model_raw_sql = node.raw_sql or node.raw_code -%}
11+
{%- else -%}
12+
{%- set model_raw_sql = '' -%}
13+
{%- endif -%}
14+
15+
{%- set re = modules.re -%}
16+
{%- set ns = namespace(complexity = 0) -%}
17+
18+
{# we remove the comments that start with -- , or other characters configured #}
19+
{%- set comment_chars_match = "(" ~ var('comment_chars') | join("|") ~ ").*" -%}
20+
{%- set model_raw_sql_no_comments = re.sub(comment_chars_match, '', model_raw_sql) -%}
21+
22+
{%- for token, token_cost in var('token_costs').items() -%}
23+
24+
{# this is not 100% perfect but it checks more or less if the token exists as a word by itself or followed by "("" like for least()/greatest() #}
25+
{%- set token_with_boundaries = "\\b" ~ token ~ "[\\t\\r\\n (]" -%}
26+
{%- set all_regex_matches = re.findall(token_with_boundaries, model_raw_sql_no_comments, re.IGNORECASE) -%}
27+
{%- set ns.complexity = ns.complexity + token_cost * (all_regex_matches | length) -%}
28+
29+
{%- endfor -%}
30+
31+
{{ return(ns.complexity) }}
32+
33+
{% endif %}
34+
35+
{{ return(0) }}
36+
37+
{% endmacro %}

macros/unpack/get_node_values.sql

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
{%- for node in nodes_list -%}
1212

1313
{%- set hard_coded_references = dbt_project_evaluator.find_all_hard_coded_references(node) -%}
14+
{%- set number_lines = dbt_project_evaluator.calculate_number_lines(node) -%}
15+
{%- set sql_complexity = dbt_project_evaluator.calculate_sql_complexity(node) -%}
1416
{%- set contract = node.contract.enforced if node.contract else false -%}
1517
{%- set exclude_node = dbt_project_evaluator.set_is_excluded(node, resource_type="node") -%}
1618

@@ -40,6 +42,8 @@
4042
"''" if not node.column_name else wrap_string_with_quotes(dbt.escape_single_quotes(node.column_name)),
4143
wrap_string_with_quotes(node.meta | tojson),
4244
wrap_string_with_quotes(dbt.escape_single_quotes(hard_coded_references)),
45+
number_lines,
46+
sql_complexity,
4347
wrap_string_with_quotes(node.get('depends_on',{}).get('macros',[]) | tojson),
4448
"cast(" ~ dbt_project_evaluator.is_not_empty_string(node.test_metadata) | trim ~ " as boolean)",
4549
"cast(" ~ exclude_node ~ " as boolean)",

mkdocs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ nav:
8181
- Configuring exceptions to the rules: customization/exceptions.md
8282
- Excluding packages and models/sources based on path: customization/excluding-packages-and-paths.md
8383
- Display issues in the logs: customization/issues-in-log.md
84-
- Querying columns: customization/querying-columns.md
84+
- Querying columns names and descriptions: customization/querying-columns-names-and-descriptions.md
8585
- Run in CI Check: ci-check.md
8686
- Querying the DAG: querying-the-dag.md
8787
- Contributing: contributing.md

models/marts/core/int_all_graph_resources.sql

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,8 @@ joined as (
112112
unioned_with_calc.loader,
113113
unioned_with_calc.identifier,
114114
unioned_with_calc.hard_coded_references, -- NULL for non-model resources
115+
unioned_with_calc.number_lines, -- NULL for non-model resources
116+
unioned_with_calc.sql_complexity, -- NULL for non-model resources
115117
unioned_with_calc.is_excluded -- NULL for metrics and exposures
116118

117119
from unioned_with_calc

models/staging/graph/stg_nodes.sql

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ select
3939
cast(null as {{ dbt.type_string() }}) as column_name,
4040
cast(null as {{ dbt.type_string() }}) as meta,
4141
cast(null as {{ dbt.type_string() }}) as hard_coded_references,
42+
cast(null as {{ dbt.type_int() }}) as number_lines,
43+
cast(null as {{ dbt.type_float() }}) as sql_complexity,
4244
cast(null as {{ dbt.type_string() }}) as macro_dependencies,
4345
cast(True as boolean) as is_generic_test,
4446
cast(True as boolean) as is_excluded

0 commit comments

Comments
 (0)