Welcome to the Data Modeling Skill Boost training session! This hands-on workshop explores three distinct data modeling techniques using the same dataset (TPC-H). You'll experience firsthand how different modeling approaches affect query complexity, maintainability, and analytical ease.
This training uses the TPC-H benchmark dataset (a standard dataset for database benchmarking) to implement three different modeling techniques:
- Dimensional Modeling (Kimball) - Star schema with facts and dimensions
- Data Vault 2.0 - Flexible, auditable enterprise data warehouse
- One Big Table (OBT) - Fully denormalized approach
Each technique is implemented as a separate dbt project using DuckDB. You'll write queries against each model to answer business questions and compare the experience across approaches.
Click the "Open in GitHub Codespaces" badge above. Everything is pre-configured and will be ready in 3-5 minutes!
The setup automatically:
- Installs uv and all dependencies
- Creates three DuckDB databases with TPC-H data
- Configures the Python environment
Prerequisites:
- Python 3.11 or higher
- Git
Installation:
# Clone the repository
git clone https://github.com/datamindedacademy/skill-boost-data-modeling
cd skill-boost-data-modeling
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies
uv sync
# Activate the virtual environment
source .venv/bin/activate
# Follow the setup instructions in each project directory to initialize data and run dbt.
├── dimensional-modeling/ # Star schema implementation
├── data-vault-20/ # Data Vault 2.0 implementation
├── one-big-table/ # Denormalized OBT implementation
├── questions/ # Business questions to answer
└── pyproject.toml # Python dependencies
Each modeling approach directory contains detailed information about the philosophy, benefits, drawbacks, and references. Explore each to understand the trade-offs:
- dimensional-modeling/ - Star schema with facts and dimensions (Kimball approach)
- data-vault-20/ - "Flexible", auditable enterprise warehouse with hubs, links, and satellites
- one-big-table/ - Fully denormalized single table approach
Navigate to the questions/ directory to find business questions to answer using each modeling technique. Compare:
- Query Complexity: How many joins? How readable is the SQL?
- Performance: How fast do queries execute?
- Flexibility: How easy is it to answer new questions?
- Maintainability: How would schema changes impact the model?
Explore your data models visually using Apache Superset with automatic DuckDB connections to all three modeling approaches.
# One-time setup (first time only)
./superset-cli setup
# Start Superset
./superset-cli start
# Wait 1-2 minutes, then access at http://localhost:8088
# Login: admin / admin
# Add DuckDB database connections (if not already added)
./superset-cli connect
# Stop Superset when done
./superset-cli stopOnce connected, you can query and visualize data from all three models
Tip: Run ./superset-cli help to see all available commands.
Found an issue or have suggestions? Open an issue or submit a pull request!
This training material is provided for educational purposes.