Skill Boost: Data Modeling Techniques

Overview

Welcome to the Data Modeling Skill Boost training session! This hands-on workshop explores three distinct data modeling techniques using the same dataset (TPC-H). You'll experience firsthand how different modeling approaches affect query complexity, maintainability, and analytical ease.

What You'll Learn

This training uses the TPC-H benchmark dataset (a standard dataset for database benchmarking) to implement three different modeling techniques:

Dimensional Modeling (Kimball) - Star schema with facts and dimensions
Data Vault 2.0 - Flexible, auditable enterprise data warehouse
One Big Table (OBT) - Fully denormalized approach

Each technique is implemented as a separate dbt project using DuckDB. You'll write queries against each model to answer business questions and compare the experience across approaches.

Getting Started

Option 1: GitHub Codespaces (Recommended)

Click the "Open in GitHub Codespaces" badge above. Everything is pre-configured and will be ready in 3-5 minutes!

The setup automatically:

Installs uv and all dependencies
Creates three DuckDB databases with TPC-H data
Configures the Python environment

Option 2: Local Setup

Prerequisites:

Python 3.11 or higher
Git

Installation:

# Clone the repository
git clone https://github.com/datamindedacademy/skill-boost-data-modeling
cd skill-boost-data-modeling

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies
uv sync

# Activate the virtual environment
source .venv/bin/activate

# Follow the setup instructions in each project directory to initialize data and run dbt

Project Structure

.
├── dimensional-modeling/    # Star schema implementation
├── data-vault-20/          # Data Vault 2.0 implementation
├── one-big-table/          # Denormalized OBT implementation
├── questions/              # Business questions to answer
└── pyproject.toml          # Python dependencies

The Three Modeling Techniques

Each modeling approach directory contains detailed information about the philosophy, benefits, drawbacks, and references. Explore each to understand the trade-offs:

dimensional-modeling/ - Star schema with facts and dimensions (Kimball approach)
data-vault-20/ - "Flexible", auditable enterprise warehouse with hubs, links, and satellites
one-big-table/ - Fully denormalized single table approach

Training Exercises

Navigate to the questions/ directory to find business questions to answer using each modeling technique. Compare:

Query Complexity: How many joins? How readable is the SQL?
Performance: How fast do queries execute?
Flexibility: How easy is it to answer new questions?
Maintainability: How would schema changes impact the model?

Visualizing with Apache Superset

Explore your data models visually using Apache Superset with automatic DuckDB connections to all three modeling approaches.

Quick Start

# One-time setup (first time only)
./superset-cli setup

# Start Superset
./superset-cli start

# Wait 1-2 minutes, then access at http://localhost:8088
# Login: admin / admin

# Add DuckDB database connections (if not already added)
./superset-cli connect

# Stop Superset when done
./superset-cli stop

Once connected, you can query and visualize data from all three models

Tip: Run ./superset-cli help to see all available commands.

Additional Resources

Contributing

Found an issue or have suggestions? Open an issue or submit a pull request!

License

This training material is provided for educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.devcontainer		.devcontainer
.superset-scripts		.superset-scripts
data-vault-20		data-vault-20
dimensional-modeling		dimensional-modeling
one-big-table		one-big-table
questions		questions
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
superset-cli		superset-cli
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Skill Boost: Data Modeling Techniques

Overview

What You'll Learn

Getting Started

Option 1: GitHub Codespaces (Recommended)

Option 2: Local Setup

Project Structure

The Three Modeling Techniques

Training Exercises

Visualizing with Apache Superset

Quick Start

Additional Resources

Contributing

License

About

Uh oh!

Releases

Packages

Languages

datamindedacademy/skill-boost-data-modeling

Folders and files

Latest commit

History

Repository files navigation

Skill Boost: Data Modeling Techniques

Overview

What You'll Learn

Getting Started

Option 1: GitHub Codespaces (Recommended)

Option 2: Local Setup

Project Structure

The Three Modeling Techniques

Training Exercises

Visualizing with Apache Superset

Quick Start

Additional Resources

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages