Skip to content

mwojtyczka/dbt-demo

Repository files navigation

Welcome to my dbt demo project!

Initiate dbt run from a local machine

Execute resources/input_data.sql script first in Databricks to setup input tables.

Update http_path in profiles.yml` to specify http path to a Databricks SQL cluster that you want to use to run dbt sql models.

Just like SQL models, there are three ways to configure Python models (more here):

You can use all-purpose or new job cluster (current config).

Then run the models as below:

# define workspace connection
export DBT_ACCESS_TOKEN=<pat_token>
export DBT_CLUSTER_ID=<cluster_id>
export DBT_HOST=<workspace_url>

# compile, run, test all models and generate documentation
# see: https://docs.getdbt.com/reference/dbt-commands

# Compiles (but does not run) the models in a project
dbt compile

# Loads CSV files into the database
dbt seed

# Runs the models in a project
dbt run

# Runs the tests defined in a project
dbt test

# Generates documentation for a project
dbt docs generate

# Serves the documentation for a project
dbt docs serve --host "" --port 8001

# run only unit tests
# https://docs.getdbt.com/docs/build/unit-tests
dbt test --select "test_type:unit"

# models from specific folders can be run individually
dbt run --target dev --model src/models/sql_demo1

# run individual models
dbt run --target dev --model --models game_details win_loss_records

Ensure project best practices

Run the following commands to run dbt project evaluator:

# install dbt packages defined in packages.yml
dbt deps

# run the evaluation
dbt build --select package:dbt_project_evaluator

(Optional) Create separate tasks for each dbt model

By default, Databricks Workflows will run all dbt objects (seeds/snapshots/models/tests) in a single task.

To run each dbt object in a separate task, you can use the databricks-dbt-factory tool.

Install the library:

pip install databricks-dbt-factory

Update tasks in the job definition resources/dbt_sql_job.yml:

databricks_dbt_factory --dbt-manifest-path target/manifest.json --input-job-spec-path resources/dbt_sql_job.yml --target-job-spec-path resources/dbt_sql_job_explicit_tasks.yml --target '${bundle.target}' --project-directory ../ --profiles-directory . --environment-key Default --new-job-name dbt_sql_job_explicit_tasks

This will produce the following workflow: workflow

Deploy dbt project as Databricks Job using Databricks Assets Bundle (DAB)

You can run dbt project from a Databricks Job. In order to do that you can deploy the job using Databricks Assets Bundle (DAB).

To deploy the Databricks job using DAB run the following:

databricks auth login --host <workspace-url>  # optional step to configure pat token
databricks bundle deploy --target dev

You can optionally provide a profile (--profile) to specify the workspace and credentials (see here).

The job will get deployed to the workspace specified in the databricks.yml file. The definition of the job can be found under resources/dbt_sql_job.yml.

The job will run dbt commands and will execute all models.

There are 2 types of clusters used in Databricks dbt jobs:

  1. SQL Warehouse - cluster used to execute the dbt model
  2. dbt CLI compute - cluster used to execute the dbt cli commands

Resources

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages