dbt Model Types Documentation

As models in dbt are intended to be reusable building blocks, instead of performing many different types of transformation in single SQL statements (e.g. nested joins and aggregations), each model in DJ is assigned a single type key which describes the primary operation it performs. This was, if two different downstream models need the same tranformation (e.g. an hourly aggregation), they can simply select from that aggregation building block.

Table of Contents

Overview
Model Types
Model Folder Structure
Quick Start Guide
Jaffle Shop Data Dictionary
Schema Validation
Common Patterns
Getting Help
Advanced Topics
Working Example Project

Overview

This comprehensive documentation covers all 11 supported dbt model types with real-world examples using jaffle shop data. Each model type serves a specific purpose in the data pipeline, from initial data ingestion through final analytics-ready datasets.

All examples are validated against their corresponding JSON schemas and use consistent jaffle shop business scenarios to demonstrate practical applications.

Model Types

Staging Models (Data Ingestion & Initial Processing)

Staging models handle the initial ingestion and basic transformation of raw data sources.

stg_select_source - Source data ingestion and initial transformations
- Purpose: Clean and standardize data directly from sources
- Use Case: Converting raw customer data into standardized format
- Key Features: Column selection, type conversion, basic transformations
stg_union_sources - Multi-source data consolidation
- Purpose: Combine similar data from multiple sources
- Use Case: Unioning order data from different store systems
- Key Features: Source consolidation, schema alignment, data lineage
stg_select_model - Staging model refinements
- Purpose: Further refine and transform staging data
- Use Case: Adding business logic to already-staged data
- Key Features: Model-based selection, additional transformations

Intermediate Models (Business Logic & Data Processing)

Intermediate models implement business logic, perform aggregations, and create enriched datasets.

int_select_model - Aggregations and business logic
- Purpose: Perform aggregations and implement business rules
- Use Case: Customer order summaries with metrics
- Key Features: Aggregations, grouping, filtering, business calculations
int_join_models - Multi-model joins for data enrichment
- Purpose: Combine data from multiple models through joins
- Use Case: Orders enriched with customer and product information
- Key Features: Multiple join types, complex relationships, data enrichment
int_join_column - Column unnesting and normalization
- Purpose: Unnest arrays and flatten complex data structures
- Use Case: Product tags analysis from array columns
- Key Features: Cross join unnest, data normalization, semi-structured data
int_union_models - Model consolidation
- Purpose: Combine processed data from multiple intermediate models
- Use Case: Consolidating feedback from surveys, reviews, and support
- Key Features: Model unioning, data standardization, multi-source integration
int_lookback_model - Time-based trailing analysis
- Purpose: Calculate metrics over trailing time periods
- Use Case: 30-day customer behavior trends
- Key Features: Rolling windows, time-based aggregations, trend analysis
int_rollup_model - Time-based aggregations
- Purpose: Roll up granular data to higher time intervals
- Use Case: Daily sales rollups from hourly transaction data
- Key Features: Time interval grouping, automatic aggregation, performance optimization

Mart Models (Analytics-Ready Presentation Layer)

Mart models provide the final, business-ready datasets optimized for reporting and analytics.

mart_select_model - Business intelligence datasets
- Purpose: Create analytics-ready datasets for BI tools
- Use Case: Customer analytics dashboard with business-friendly metrics
- Key Features: Business-friendly naming, unit conversion, executive metrics
mart_join_models - Comprehensive 360-degree views
- Purpose: Create sophisticated datasets by joining multiple intermediate models
- Use Case: Complete customer 360 view with orders, behavior, and preferences
- Key Features: Multi-model joins, executive reporting, comprehensive business intelligence

Model Folder Structure

Naming Convention

Pattern: <layer>__<dbt_group>__<model_topic>__<model_name>

Examples:

stg__customers__profiles__clean
stg__sales__orders__standardized
int__customers__profiles__summary
int__sales__orders__enriched
mart__customers__dashboard__analytics
mart__analytics__dashboard__comprehensive_analytics

Directory Organization

DJ automatically organizes your models into a logical folder structure:

models/
├── staging/
│   ├── customers/
│   │   └── profiles/
│   │       └── stg__customers__profiles__clean.sql
│   ├── sales/
│   │   ├── orders/
│   │   │   └── stg__sales__orders__standardized.sql
│   │   ├── items/
│   │   │   └── stg__sales__items__order_details.sql
│   │   └── stores/
│   │       └── stg__sales__stores__locations.sql
│   ├── products/
│   │   └── catalog/
│   │       └── stg__products__catalog__catalog.sql
│   └── supply_chain/
│       └── supplies/
│           └── stg__supply_chain__supplies__inventory.sql
├── intermediate/
│   ├── customers/
│   │   └── profiles/
│   │       └── int__customers__profiles__summary.sql
│   ├── sales/
│   │   └── orders/
│   │       └── int__sales__orders__enriched.sql
│   ├── products/
│   │   └── analytics/
│   │       └── int__products__analytics__product_popularity.sql
│   └── supply_chain/
│       └── supplies/
│           └── int__supply_chain__supplies__cost_analysis.sql
└── marts/
    ├── customers/
    │   └── dashboard/
    │       └── mart__customers__dashboard__analytics.sql
    ├── sales/
    │   └── reporting/
    │       ├── mart__sales__reporting__revenue.sql
    │       └── mart__sales__reporting__profitability.sql
    ├── products/
    │   └── reporting/
    │       ├── mart__products__reporting__menu_analytics.sql
    │       └── mart__products__reporting__cost_efficiency.sql
    └── analytics/
        └── dashboard/
            └── mart__analytics__dashboard__comprehensive_analytics.sql

Automatic File Management

Smart Placement: DJ places models in appropriate folders based on their type and configuration
Auto-Migration: When you update model configurations, DJ automatically moves files to maintain organization
Consistent Structure: Maintains standardized folder hierarchy across your entire project

Quick Start Guide

1. Choose Your Model Type

For Raw Data Ingestion:

stg_select_source - Clean and standardize data from a single source
stg_union_sources - Combine similar data from multiple sources
stg_select_model - Refine already-staged data with additional transformations

For Business Logic & Processing:

int_select_model - Perform aggregations and implement business rules
int_join_models - Enrich data by joining multiple models
int_join_column - Unnest arrays and flatten complex structures
int_union_models - Consolidate data from multiple intermediate models
int_lookback_model - Calculate metrics over trailing time periods
int_rollup_model - Roll up granular data to higher time intervals

For Analytics & Reporting:

mart_select_model - Create business-friendly datasets for BI tools
mart_join_models - Build comprehensive 360-degree business views

2. Follow the Data Pipeline Flow

Raw Sources → Staging Models → Intermediate Models → Mart Models → BI Tools
     ↓              ↓                    ↓                ↓           ↓
  Raw CSV      Clean & Prep        Business Logic    Analytics    Dashboards
  Database     Standardize         Aggregations      Ready Data   Reports
  APIs         Type Casting        Joins & Unions    Metrics      Analysis

Common Pipeline Patterns:

Simple Pipeline:

stg_select_source → Clean raw customer data
int_select_model → Calculate customer metrics
mart_select_model → Create customer analytics dataset

Complex Pipeline:

stg_union_sources → Combine order data from multiple stores
int_join_models → Enrich orders with customer & product data
int_rollup_model → Create daily sales summaries
mart_join_models → Build comprehensive sales analytics

3. Use Consistent Naming

Naming Components:

Component	Purpose	Examples
Layer	Model type prefix	`stg`, `int`, `mart`
Group	Business domain	`customers`, `sales`, `products`, `supply_chain`, `analytics`
Topic	Data subject area	`profiles`, `orders`, `items`, `stores`, `catalog`, `supplies`, `dashboard`, `reporting`
Name	Specific purpose	`clean`, `standardized`, `summary`, `enriched`, `analytics`, `comprehensive_analytics`

Real Examples:

stg__customers__profiles__clean - Clean raw customer profile data
stg__sales__orders__standardized - Standardized order data from raw sources
int__customers__profiles__summary - Customer profile summaries with business logic
int__sales__orders__enriched - Orders enriched with customer and store data
mart__customers__dashboard__analytics - Customer analytics for dashboards
mart__analytics__dashboard__comprehensive_analytics - Comprehensive business analytics

Best Practices:

✅ Use descriptive, business-friendly names
✅ Follow the layer__group__topic__name pattern consistently
✅ Keep names concise but meaningful
❌ Avoid abbreviations that aren't universally understood
❌ Don't use special characters or spaces

Jaffle Shop Data Dictionary

Our examples use consistent test data representing a fictional Jaffle shop chain with the following data structure:

Data Tables

raw_stores

Store locations across major US cities with tax rates and opening dates.

Column	Type	Description
`id`	UUID	Unique store identifier
`name`	String	Store location city name
`opened_at`	Timestamp	Store opening date (ISO 8601)
`tax_rate`	Decimal	Local tax rate (0.04 to 0.08)

Sample Stores:

Philadelphia: Opened 2016-09-01, 6% tax rate
Brooklyn: Opened 2017-03-12, 4% tax rate
Chicago: Opened 2018-04-29, 6.25% tax rate
San Francisco: Opened 2018-05-09, 7.5% tax rate
New Orleans: Opened 2019-03-10, 4% tax rate
Los Angeles: Opened 2019-09-13, 8% tax rate

raw_customers

Customer profiles with UUID identifiers and names.

Column	Type	Description
`id`	UUID	Unique customer identifier
`name`	String	Customer full name

Sample Customers: Misty Reed, Brandon Hill, Brad Williamson, Andrea Moore, Wyatt Bates, Adam Rowe, Nicole Hall, Jeffrey Gutierrez, John Bennett, and 928 others.

raw_orders

Order transactions linking customers to stores with financial details.

Column	Type	Description
`id`	UUID	Unique order identifier
`customer`	UUID	Foreign key to raw_customers.id
`ordered_at`	Timestamp	Order timestamp (ISO 8601)
`store_id`	UUID	Foreign key to raw_stores.id
`subtotal`	Integer	Subtotal in cents
`tax_paid`	Integer	Tax amount in cents
`order_total`	Integer	Total order amount in cents

raw_items

Individual line items for each order, linking to products.

Column	Type	Description
`id`	UUID	Unique line item identifier
`order_id`	UUID	Foreign key to raw_orders.id
`sku`	String	Product SKU (JAF-xxx or BEV-xxx)

raw_products

Complete product catalog with pricing and descriptions.

Column	Type	Description
`sku`	String	Product SKU identifier
`name`	String	Product display name
`type`	String	Product category (jaffle or beverage)
`price`	Integer	Product price in cents
`description`	String	Product description

Jaffle Products:

JAF-001: "nutellaphone who dis?" - $11.00 - Nutella and banana jaffle
JAF-002: "doctor stew" - $11.00 - House-made beef stew jaffle
JAF-003: "the krautback" - $12.00 - Lamb and pork bratwurst with sauerkraut
JAF-004: "flame impala" - $14.00 - Pulled pork and pineapple with ghost pepper sauce
JAF-005: "mel-bun" - $12.00 - Melon and minced beef bao in a jaffle

Beverage Products:

BEV-001: "tangaroo" - $6.00 - Mango and tangerine smoothie
BEV-002: "chai and mighty" - $5.00 - Oatmilk chai latte with protein boost
BEV-003: "vanilla ice" - $6.00 - Iced coffee with french vanilla syrup
BEV-004: "for richer or pourover" - $7.00 - Single estate beans pourover
BEV-005: "adele-ade" - $4.00 - Kiwi and lime agua fresca

raw_supplies

Supply chain data linking ingredients and supplies to products.

Column	Type	Description
`id`	String	Supply item identifier (SUP-xxx)
`name`	String	Supply item name
`cost`	Integer	Supply cost in cents
`perishable`	Boolean	Whether item is perishable
`sku`	String	Associated product SKU

Business Metrics

Revenue: All monetary values tracked in cents for precision, converted to dollars in marts
Dates: Consistent ISO 8601 format (YYYY-MM-DDTHH:MM:SS)
Categories: Products categorized as "jaffle" or "beverage"
Identifiers: All primary keys use UUID format for realistic data modeling
Supply Chain: Complete ingredient tracking from raw supplies to finished products

Schema Validation

All configuration examples in this documentation are validated against their corresponding JSON schemas located in /schemas/. This ensures:

Technical Accuracy: Every example can be used directly
Schema Compliance: All required fields and constraints are met
Best Practices: Examples follow recommended patterns

Common Patterns

Data Pipeline Stages

Staging: Clean, standardize, and prepare raw data
Intermediate: Apply business logic, perform joins and aggregations
Mart: Create analytics-ready, business-friendly datasets

Model Selection Decision Tree

Need to process raw sources? → Use stg_* models
Need business logic/aggregations? → Use int_* models
Need analytics-ready data? → Use mart_* models

Multiple sources to combine? → Use *_union_* models
Need to join related data? → Use *_join_* models
Need time-based analysis? → Use int_lookback_model or int_rollup_model

Configuration Best Practices

Always specify type: Identifies the model type
Use descriptive name: Clear, business-oriented naming
Organize with group and topic: Logical categorization
Document with description: Business context and purpose

Getting Help

Check the specific model documentation for detailed examples
Validate against schemas in /schemas/ directory
Review jaffle shop examples for consistent patterns
Test with sample data before production deployment

Advanced Topics

Inline CTEs and Pre-Aggregation

See CTE Patterns for conventions around inline CTEs (ctes array), where Lightdash metrics belong when pre-aggregating in a CTE, and the aggregation / auto-injection rules the framework enforces across the CTE boundary.

Model Dependencies

Understanding the relationship between model types:

Staging models feed intermediate models
Intermediate models feed mart models
Mart models feed BI tools and dashboards

Performance Optimization

Materialization: Choose appropriate strategy for data volume
Incremental Processing: Use for large, time-based datasets
Join Optimization: Order joins by data volume (smallest first)
Aggregation Strategy: Pre-aggregate in intermediate models

Incremental Strategies (dbt-trino)

Set materialization.strategy.type to one of the following (or rely on the extension default via dj.materialization.defaultIncrementalStrategy):

Strategy	Summary	Caveat
`append`	Inserts new rows with no de-duplication. Fastest.	Upstream must guarantee no duplicates in the new slice.
`delete+insert`	Partition-safe upsert. Safe default.	`unique_key` auto-derived from partition columns when omitted. Works on Delta Lake, Hive, and Iceberg.
`merge`	Row-level upsert on `unique_key`.	dbt-trino requires Iceberg format on the target table. On Delta Lake / Hive use `delete+insert` instead.
`overwrite_existing_partitions`	Drops and rewrites only the partitions present in the new slice. `unique_key` is not applicable, the consumer macro derives the partition list from the new slice itself, so the schema rejects `unique_key` on this strategy.	Requires a custom dbt macro in your project (e.g. `get_incremental_overwrite_existing_partitions_sql`). The DJ (Data JSON) Framework does NOT ship this macro and dbt-trino does NOT provide it natively. If your project does not define it, use `delete+insert` with a partition column as `unique_key`, it produces equivalent behavior for daily/monthly partitioned models.
`dj_iceberg_partition_overwrite`	Drops and rewrites only the partitions present in the new slice on Iceberg tables. `unique_key` is not applicable, the macro derives the partition list from the new slice itself by reading `properties.partitioning`, so the schema rejects `unique_key`.	Shipped by DJ. The dispatch macro `get_incremental_dj_iceberg_partition_overwrite_sql` lives in `macros/strategies.sql` and is auto-copied to `<project>/macros/_ext_/strategies.sql` on DJ: Refresh Projects. Requires Iceberg format on the target table, set `materialization.format: "iceberg"` or the project var `storage_type: iceberg`. DJ flags non-Iceberg use in the Problems tab. On Delta Lake / Hive use `delete+insert` instead.

Data Quality

Validation: Implement data quality checks at each stage
Testing: Use dbt tests for critical business logic
Monitoring: Track data freshness and completeness
Documentation: Maintain clear business definitions

Working Example Project

Want to see these model types in action? Check out the Jaffle Shop Example Project which demonstrates working models across a realistic business scenario.

This documentation provides comprehensive coverage of all dbt model types supported by the DJ (Data JSON) Framework. Each model type includes detailed examples, best practices, and integration guidance to help you build effective data pipelines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dbt Model Types Documentation

Overview

Model Types

Staging Models (Data Ingestion & Initial Processing)

Intermediate Models (Business Logic & Data Processing)

Mart Models (Analytics-Ready Presentation Layer)

Model Folder Structure

Naming Convention

Directory Organization

Automatic File Management

Quick Start Guide

1. Choose Your Model Type

2. Follow the Data Pipeline Flow

3. Use Consistent Naming

Jaffle Shop Data Dictionary

Data Tables

raw_stores

raw_customers

raw_orders

raw_items

raw_products

raw_supplies

Business Metrics

Schema Validation

Common Patterns

Data Pipeline Stages

Model Selection Decision Tree

Configuration Best Practices

Getting Help

Advanced Topics

Inline CTEs and Pre-Aggregation

Model Dependencies

Performance Optimization

Incremental Strategies (dbt-trino)

Data Quality

Working Example Project

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

dbt Model Types Documentation

Overview

Model Types

Staging Models (Data Ingestion & Initial Processing)

Intermediate Models (Business Logic & Data Processing)

Mart Models (Analytics-Ready Presentation Layer)

Model Folder Structure

Naming Convention

Directory Organization

Automatic File Management

Quick Start Guide

1. Choose Your Model Type

2. Follow the Data Pipeline Flow

3. Use Consistent Naming

Jaffle Shop Data Dictionary

Data Tables

raw_stores

raw_customers

raw_orders

raw_items

raw_products

raw_supplies

Business Metrics

Schema Validation

Common Patterns

Data Pipeline Stages

Model Selection Decision Tree

Configuration Best Practices

Getting Help

Advanced Topics

Inline CTEs and Pre-Aggregation

Model Dependencies

Performance Optimization

Incremental Strategies (dbt-trino)

Data Quality

Working Example Project