Skip to content

mzazakeith/dbt_cap

Repository files navigation

dbt Project

A dbt project that aggregates Community Health Worker (CHW) activity data into monthly summaries for performance dashboards.


Project Structure

main_task/
 models/
    starter_code/
       chw_activity_monthly.sql   # Main dbt model
       schema.yml                  # Tests & documentation
       sources.yml                 # Source table definition
    macros/
        month_assignment.sql        # Reusable macro for date logic
 dbt_project.yml                     # dbt project configuration
 packages.yml                        # External package dependencies
 business_requirements.md            # What the model needs to do
 schema_documentation.md             # Source table column definitions
 sample_data.sql                     # Example data for testing
 expected_output.md                  # What the output should look like

Key Concepts

1. Incremental Model

Instead of rebuilding the entire table every time, an incremental model only processes new or changed data. This is faster and more efficient for large datasets.

{{ config(
    materialized='incremental',
    unique_key=['chv_id', 'report_month'],
    incremental_strategy='delete+insert'
) }}

2. The 26th Cutoff Rule

This is the special business rule: activities on or after the 26th of a month are assigned to the NEXT month.

Activity Date Assigned To
2025-01-15 January
2025-01-26 February
2025-01-31 February
2025-12-26 January (next year!)

3. Macros

Macros are reusable code snippets in dbt. The month_assignment macro implements the 26th cutoff rule and can be used across multiple models:

{{ month_assignment('activity_date') }} as report_month

4. Tests

dbt tests validate data quality. This project tests:

  • not_null: Key fields must have values
  • unique_combination: No duplicate CHW + month combinations
  • expression_is_true: Counts must be >= 0

Prerequisites

  • Python 3.8+
  • PostgreSQL database
  • uv (Python package manager) or pip

Install Dependencies

# Install Python dependencies (dbt-core, dbt-postgres)
uv sync

# Install dbt packages (like dbt_utils)
uv run dbt deps

How to Run

1. Build the Model

# Full refresh (rebuilds entire table)
uv run dbt run --select chw_activity_monthly --full-refresh

# Incremental run (only processes recent data)
uv run dbt run --select chw_activity_monthly

2. Run Tests

uv run dbt test --select chw_activity_monthly

Data Quality Rules

The model excludes invalid records:

  • Records with NULL chv_id (unknown CHW)
  • Records with NULL activity_date (no date)
  • Records where is_deleted = TRUE (soft-deleted)

Useful Commands

# Clean compiled files
uv run dbt clean

# Compile SQL without running (useful for debugging)
uv run dbt compile --select chw_activity_monthly

# Show the compiled SQL
cat target/compiled/chw_analytics/models/starter_code/chw_activity_monthly.sql

# Run everything (all models + tests)
uv run dbt build --select chw_activity_monthly

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors