-
Notifications
You must be signed in to change notification settings - Fork 22
Snowflake
Herminio Vazquez edited this page Feb 11, 2024
·
8 revisions
Working inside Snowflake, is the activity of running data quality checks using Snowflake Python Worksheets.
-
stageunder data schema in Snowflake -
cuallee.whlthe wheel distribution from PiPy index with the latest version ofcuallee -
wheel_loader.pya python script developed by the Snowflake Labs -
anacondadependencies added to your worksheet -
warehousewhere to run the python worksheet
- Make sure that you enabled and accept the Anaconda Python Packages terms and conditions under
Admin > Billing & Terms - Select a schema in your Snowflake instance and create a
stage, it does not matter if it is internal or external. Let's call itDEMO_STAGE - Proceed to the PiPy index and proceed to download the built distribution of
cuallee. At the time of this writing the file available is:cuallee-0.8.5-py3-none-any.whl - Upload your
.whlfile into theDEMO_STAGEeither via theclior through the UI - Proceed to download the
wheel_loader.pyavailable here - Upload your
wheel_loader.pyfile into theDEMO_STAGEeither via theclior through the UI - Create a new worksheet using the
+sign in Snowflake Worksheets and selectPython Worksheet - In the top right corner of your worksheet, don't forget to select the warehouse to be used to execute this worksheet
- In the top left corner of your worksheet, select the database schema that contains the
DEMO_STAGE - Next to the schema selection, and the settings drop down menu, press on the packages drop-down menu
-
2tabs will be available: Anaconda Packages and Stage Packages - In the Anaconda Packages add the following library dependencies required by
cuallee:colorama==0.4.6pandas==1.5.3pygments==2.15.1requests==2.31.0toolz==0.12.0snowflake-snowpark-python==1.11.1
- In the Stage Packages add the following library dependencies to use
cuallee:@demo_stage/cuallee-0.8.5-py3-none-any.whl@demo_stage/wheel_loader.py
- After completing the package setup for both Anaconda and Stage, the added libraries should appear under the bottom of the drop-down inside the Installed Packages
- At this point you are ready to go! below a snippet to test the use of
cualleeinside Snowflake
# cuallee
# checks inside snowflake demo
import snowflake.snowpark as snowpark
import wheel_loader
def main(session: snowpark.Session):
# Your code goes here, inside the "main" handler.
wheel_loader.load('cuallee-0.8.5-py3-none-any.whl')
from cuallee import Check, CheckLevel, Control
check = Check(CheckLevel.WARNING, "Custom", session=session)
tableName = 'snowflake_sample_data.tpch_sf100.lineitem'
dataframe = session.table(tableName)
check.is_greater_than("L_QUANTITY", 2)
check.is_legit("L_COMMENT")
# Return value will appear in the Results tab.
return Control.completeness(dataframe, session=session).union(check.validate(dataframe))pip install cuallee-
pip install cuallee[snowpark]orpip install snowflake-snowpark-python - Set environment variables to start a session
-
SF_ACCOUNTobtained by clicking into the bottom left part of your snowflake account and selectingCopy account url - Then remove the
https://part and also thesnowflakecomputing.compart of the URL - It should end up in something like this:
SF_ACCOUNT=1234567.region-name.cloud -
SF_USERyour snowflake username -
SF_PASSWORDyour snowflake password -
SF_ROLEyour snowflake role i.e.ACCOUNTADMIN -
SF_WAREHOUSEyour designated warehouse for running data quality checks i.e.COMPUTE_WH -
SF_DATABASEyour database selection for running checks i.e.SNOWFLAKE_SAMPLE_DATA
-