Skip to content

Commit ceeb800

Browse files
committed
Support for PostgreSQL
1 parent 2c99743 commit ceeb800

File tree

8 files changed

+40
-16
lines changed

8 files changed

+40
-16
lines changed

README.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,29 +4,29 @@ A package for dbt which enables standardization of data sets. You can use it to
44

55
The package contains a set of macros that mirror the functionality of the [scikit-learn preprocessing module](https://scikit-learn.org/stable/modules/preprocessing.html). Originally they were developed as part of the 2019 Medium article [Feature Engineering in Snowflake](https://medium.com/omnata/feature-engineering-in-snowflake-4312032e0d53).
66

7-
Currently they have been tested in Snowflake, Redshift , BigQuery, and SQL Server. The test case expectations have been built using scikit-learn (see *.py in [integration_tests/data/sql](integration_tests/data/sql)), so you can expect behavioural parity with it.
7+
Currently they have been tested in Snowflake, Redshift , BigQuery, SQL Server and PostgreSQL 13.2. The test case expectations have been built using scikit-learn (see *.py in [integration_tests/data/sql](integration_tests/data/sql)), so you can expect behavioural parity with it.
88

99
The macros are:
1010

11-
| scikit-learn function | macro name | Snowflake | BigQuery | Redshift | MSSQL | Example |
12-
| --- | --- | --- | --- | --- | --- | --- |
13-
| [KBinsDiscretizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html#sklearn.preprocessing.KBinsDiscretizer)| k_bins_discretizer | Y | Y | Y | N | ![example](images/k_bins.gif) |
14-
| [LabelEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder)| label_encoder | Y | Y | Y | Y | ![example](images/label_encoder.gif) |
15-
| [MaxAbsScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html#sklearn.preprocessing.MaxAbsScaler) | max_abs_scaler | Y | Y | Y | Y | [![example](images/max_abs_scaler.png)](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#maxabsscaler) |
16-
| [MinMaxScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler) | min_max_scaler | Y | Y | Y | N | [![example](images/min_max_scaler.png)](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#minmaxscaler) |
17-
| [Normalizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html#sklearn.preprocessing.Normalizer) | normalizer | Y | Y | Y | Y | [![example](images/normalizer.png)](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#normalizer) |
18-
| [OneHotEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder) | one_hot_encoder | Y | Y | Y | Y | ![example](images/one_hot_encoder.gif) |
19-
| [QuantileTransformer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html#sklearn.preprocessing.QuantileTransformer) | quantile_transformer | Y | Y | N | N | [![example](images/quantile_transformer.png)](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#quantiletransformer-uniform-output) |
20-
| [RobustScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html#sklearn.preprocessing.RobustScaler) | robust_scaler | Y | Y | Y | N | [![example](images/robust_scaler.png)](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#robustscaler) |
21-
| [StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) | standard_scaler | Y | Y | Y | N | [![example](images/standard_scaler.png)](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#standardscaler) |
11+
| scikit-learn function | macro name | Snowflake | BigQuery | Redshift | MSSQL | PostgreSQL | Example |
12+
| --- | --- | --- | --- | --- | --- | --- | --- |
13+
| [KBinsDiscretizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html#sklearn.preprocessing.KBinsDiscretizer)| k_bins_discretizer | Y | Y | Y | N | Y | ![example](images/k_bins.gif) |
14+
| [LabelEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder)| label_encoder | Y | Y | Y | Y | Y | ![example](images/label_encoder.gif) |
15+
| [MaxAbsScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html#sklearn.preprocessing.MaxAbsScaler) | max_abs_scaler | Y | Y | Y | Y | Y | [![example](images/max_abs_scaler.png)](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#maxabsscaler) |
16+
| [MinMaxScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler) | min_max_scaler | Y | Y | Y | N | Y | [![example](images/min_max_scaler.png)](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#minmaxscaler) |
17+
| [Normalizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html#sklearn.preprocessing.Normalizer) | normalizer | Y | Y | Y | Y | Y | [![example](images/normalizer.png)](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#normalizer) |
18+
| [OneHotEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder) | one_hot_encoder | Y | Y | Y | Y | Y | ![example](images/one_hot_encoder.gif) |
19+
| [QuantileTransformer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html#sklearn.preprocessing.QuantileTransformer) | quantile_transformer | Y | Y | N | N | Y | [![example](images/quantile_transformer.png)](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#quantiletransformer-uniform-output) |
20+
| [RobustScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html#sklearn.preprocessing.RobustScaler) | robust_scaler | Y | Y | Y | N | Y | [![example](images/robust_scaler.png)](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#robustscaler) |
21+
| [StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) | standard_scaler | Y | Y | Y | N | Y | [![example](images/standard_scaler.png)](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#standardscaler) |
2222

2323
_\* 2D charts taken from [scikit-learn.org](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html), GIFs are my own_
2424
## Installation
2525
To use this in your dbt project, create or modify packages.yml to include:
2626
```
2727
packages:
2828
- package: "omnata-labs/dbt_ml_preprocessing"
29-
version: [">=1.0.1"]
29+
version: [">=1.0.2"]
3030
```
3131
_(replace the revision number with the latest)_
3232

dbt_project.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
name: 'dbt_ml_preprocessing'
2-
version: '1.0.1'
2+
version: '1.0.2'
33

44
require-dbt-version: ">=0.15.1"
55

integration_tests/macros/equality_with_numeric_tolerance.sql

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,10 @@ where percent_difference > {{ percentage_tolerance }}
6060
{% do return( redshift__test_equality_with_numeric_tolerance(model,compare_model,source_join_column,target_join_column,source_numeric_column_name,target_numeric_column_name,percentage_tolerance,output_all_rows=False)) %}
6161
{% endmacro %}
6262

63+
{% macro postgres__test_equality_with_numeric_tolerance(model,compare_model,source_join_column,target_join_column,source_numeric_column_name,target_numeric_column_name,percentage_tolerance,output_all_rows=False) %}
64+
{% do return( redshift__test_equality_with_numeric_tolerance(model,compare_model,source_join_column,target_join_column,source_numeric_column_name,target_numeric_column_name,percentage_tolerance,output_all_rows=False)) %}
65+
{% endmacro %}
66+
6367

6468
{% macro snowflake__test_equality_with_numeric_tolerance(model,compare_model,source_join_column,target_join_column,source_numeric_column_name,target_numeric_column_name,percentage_tolerance,output_all_rows=False) %}
6569
{% set compare_cols_csv = compare_columns | join(', ') %}

integration_tests/macros/quantile_transformer_model_macro.sql

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,13 @@ with data as (
55
select * from data
66
{% endmacro %}
77

8+
{% macro postgres__quantile_transformer_model_macro() %}
9+
with data as (
10+
{{ dbt_ml_preprocessing.quantile_transformer( ref('data_quantile_transformer') ,'col_to_transform') }}
11+
)
12+
select * from data
13+
{% endmacro %}
14+
815
-- macro not supported in other databases
916
{% macro default__quantile_transformer_model_macro() %}
1017
select 1 from (select 1) where 1=2 -- empty result set so that test passes

integration_tests/macros/test_quantile_transformer_result_with_tolerance.sql

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,3 +19,8 @@ select 1 from (select 1) where 1=2 -- empty result set so that test passes
1919
{% macro sqlserver__test_quantile_transformer_result_with_tolerance() %}
2020
select null as '1' where 1=2 -- empty result set so that test passes
2121
{% endmacro %}
22+
23+
-- testing macro not supported in postgres
24+
{% macro postgres__test_quantile_transformer_result_with_tolerance() %}
25+
select null where 1=2 -- empty result set so that test passes
26+
{% endmacro %}

integration_tests/models/sql/schema.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ models:
1919
target_join_column: id_col
2020
source_numeric_column_name: col_to_scale_scaled
2121
target_numeric_column_name: col_to_scale_scaled
22-
percentage_tolerance: 0.00000001
22+
percentage_tolerance: 0.009
2323

2424
- name: test_min_max_scaler_with_column_selection
2525
tests:
@@ -29,7 +29,7 @@ models:
2929
target_join_column: id_col
3030
source_numeric_column_name: col_to_scale_scaled
3131
target_numeric_column_name: col_to_scale_scaled
32-
percentage_tolerance: 0.00000001
32+
percentage_tolerance: 0.009
3333

3434
- name: test_k_bins_discretizer_default_bins
3535
tests:

macros/label_encoder.sql

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,3 +57,7 @@ from {{ source_table }}
5757
{% macro sqlserver__label_encoder(source_table,source_column,include_columns) %}
5858
{% do return( dbt_ml_preprocessing.redshift__label_encoder(source_table,source_column,include_columns)) %}
5959
{%- endmacro %}
60+
61+
{% macro postgres__label_encoder(source_table,source_column,include_columns) %}
62+
{% do return( dbt_ml_preprocessing.redshift__label_encoder(source_table,source_column,include_columns)) %}
63+
{% endmacro %}

macros/quantile_transformer.sql

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,4 +70,8 @@ from linear_interpolation_variables
7070
The `quantile_transformer` macro is only supported on Snowflake and BigQuery at this time. It should work on other DBs, it just requires some rework.
7171
{% endset %}
7272
{%- do exceptions.raise_compiler_error(error_message) -%}
73+
{% endmacro %}
74+
75+
{% macro postgre__quantile_transformer(source_table,source_column,n_quantiles,output_distribution,subsample,include_columns) %}
76+
{% do return( dbt_ml_preprocessing.bigquery__quantile_transformer(source_table,source_column,n_quantiles,output_distribution,subsample,include_columns)) %}
7377
{% endmacro %}

0 commit comments

Comments
 (0)