You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+13-13Lines changed: 13 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,29 +4,29 @@ A package for dbt which enables standardization of data sets. You can use it to
4
4
5
5
The package contains a set of macros that mirror the functionality of the [scikit-learn preprocessing module](https://scikit-learn.org/stable/modules/preprocessing.html). Originally they were developed as part of the 2019 Medium article [Feature Engineering in Snowflake](https://medium.com/omnata/feature-engineering-in-snowflake-4312032e0d53).
6
6
7
-
Currently they have been tested in Snowflake, Redshift , BigQuery, and SQL Server. The test case expectations have been built using scikit-learn (see *.py in [integration_tests/data/sql](integration_tests/data/sql)), so you can expect behavioural parity with it.
7
+
Currently they have been tested in Snowflake, Redshift , BigQuery, SQL Server and PostgreSQL 13.2. The test case expectations have been built using scikit-learn (see *.py in [integration_tests/data/sql](integration_tests/data/sql)), so you can expect behavioural parity with it.
8
8
9
9
The macros are:
10
10
11
-
| scikit-learn function | macro name | Snowflake | BigQuery | Redshift | MSSQL | Example |
12
-
| --- | --- | --- | --- | --- | --- | --- |
13
-
|[KBinsDiscretizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html#sklearn.preprocessing.KBinsDiscretizer)| k_bins_discretizer | Y | Y | Y | N ||
14
-
|[LabelEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder)| label_encoder | Y | Y | Y | Y ||
15
-
|[MaxAbsScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html#sklearn.preprocessing.MaxAbsScaler)| max_abs_scaler | Y | Y | Y | Y |[](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#maxabsscaler)|
16
-
|[MinMaxScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler)| min_max_scaler | Y | Y | Y | N |[](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#minmaxscaler)|
17
-
|[Normalizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html#sklearn.preprocessing.Normalizer)| normalizer | Y | Y | Y | Y |[](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#normalizer)|
18
-
|[OneHotEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder)| one_hot_encoder | Y | Y | Y | Y ||
19
-
|[QuantileTransformer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html#sklearn.preprocessing.QuantileTransformer)| quantile_transformer | Y | Y | N | N |[](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#quantiletransformer-uniform-output)|
20
-
|[RobustScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html#sklearn.preprocessing.RobustScaler)| robust_scaler | Y | Y | Y | N |[](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#robustscaler)|
21
-
|[StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler)| standard_scaler | Y | Y | Y | N |[](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#standardscaler)|
11
+
| scikit-learn function | macro name | Snowflake | BigQuery | Redshift | MSSQL |PostgreSQL |Example |
12
+
| --- | --- | --- | --- | --- | --- | --- | --- |
13
+
|[KBinsDiscretizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html#sklearn.preprocessing.KBinsDiscretizer)| k_bins_discretizer | Y | Y | Y | N |Y ||
14
+
|[LabelEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder)| label_encoder | Y | Y | Y | Y |Y ||
15
+
|[MaxAbsScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html#sklearn.preprocessing.MaxAbsScaler)| max_abs_scaler | Y | Y | Y | Y |Y |[](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#maxabsscaler)|
16
+
|[MinMaxScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler)| min_max_scaler | Y | Y | Y | N |Y |[](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#minmaxscaler)|
17
+
|[Normalizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html#sklearn.preprocessing.Normalizer)| normalizer | Y | Y | Y | Y |Y |[](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#normalizer)|
18
+
|[OneHotEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder)| one_hot_encoder | Y | Y | Y | Y |Y ||
19
+
|[QuantileTransformer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html#sklearn.preprocessing.QuantileTransformer)| quantile_transformer | Y | Y | N | N |Y |[](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#quantiletransformer-uniform-output)|
20
+
|[RobustScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html#sklearn.preprocessing.RobustScaler)| robust_scaler | Y | Y | Y | N |Y |[](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#robustscaler)|
21
+
|[StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler)| standard_scaler | Y | Y | Y | N |Y |[](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#standardscaler)|
22
22
23
23
_\* 2D charts taken from [scikit-learn.org](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html), GIFs are my own_
24
24
## Installation
25
25
To use this in your dbt project, create or modify packages.yml to include:
Copy file name to clipboardExpand all lines: integration_tests/macros/equality_with_numeric_tolerance.sql
+4Lines changed: 4 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -60,6 +60,10 @@ where percent_difference > {{ percentage_tolerance }}
60
60
{% do return( redshift__test_equality_with_numeric_tolerance(model,compare_model,source_join_column,target_join_column,source_numeric_column_name,target_numeric_column_name,percentage_tolerance,output_all_rows=False)) %}
{% do return( redshift__test_equality_with_numeric_tolerance(model,compare_model,source_join_column,target_join_column,source_numeric_column_name,target_numeric_column_name,percentage_tolerance,output_all_rows=False)) %}
{% do return( dbt_ml_preprocessing.bigquery__quantile_transformer(source_table,source_column,n_quantiles,output_distribution,subsample,include_columns)) %}
0 commit comments