A modern, scalable data engineering platform for managing multiple independent labs using dbt, DuckDB, and Python.
- ** Quick Start** - Get running in 10 minutes
- ** Architecture** - Understand the design
- ** FAQ** - Common questions answered
- ** Troubleshooting** - Fix common issues
BuildCPG Labs lets you:
✅ Run multiple independent data labs - Each with its own database, code, and configuration
✅ Share reusable utilities - DataInspector, CSVMonitor, validators across all labs
✅ Use consistent commands - make run, make test, make inspect for any lab
✅ Create new labs instantly - ./setup_new_lab.sh lab_name in 2 minutes
✅ Scale without complexity - Go from 1 lab to 100 with the same structure
✅ No Docker required - Works on Mac 11+, Linux, no containers needed
buildcpg-labs/
├── shared/ # Reusable utilities (ALL labs use this)
│ ├── utils/ # DataInspector, CSVMonitor, config loaders
│ └── templates/ # Templates for new labs
├── config/ # Central configuration
│ ├── labs_config.yaml # Registry of all labs
│ └── paths.py # Path helpers
├── lab1_sales_performance/ # LAB 1 (independent)
├── lab2_forecast_model/ # LAB 2 (independent)
└── lab3.../ # LAB N (independent)
# Python 3.11+
python --version
# Git
git --version
# Make (for Makefile commands)
make --version# 1. Clone
git clone https://github.com/narensham/buildcpg-labs.git
cd buildcpg-labs
# 2. Install dependencies
pip install pyyaml duckdb pandas
# 3. Verify
python config/paths.py
# ✅ Lab1 config loaded
# 4. Setup lab1
cd lab1_sales_performance
make setup
# 5. Run pipeline
make run
# 6. Inspect data
make inspectThat's it! You're ready.
cd lab1_sales_performance
make setup # Initialize lab (one time)
make run # Run dbt pipeline
make test # Run dbt tests
make inspect # Check data quality
make clean # Clean build artifactscd ..
./setup_new_lab.sh lab2_forecast_model
cd lab2_forecast_model
make run├── shared/
│ ├── utils/
│ │ ├── data_inspector.py # Inspect any lab's database
│ │ ├── csv_monitor.py # Detect new data
│ │ └── config_loader.py # Access lab config
│ └── templates/
│ ├── Makefile # Template for new labs
│ └── requirements.txt
│
├── config/
│ ├── labs_config.yaml # All labs registered here
│ └── paths.py # Get paths to any lab
│
├── lab1_sales_performance/ # Lab 1 (complete example)
│ ├── dbt/ # dbt project
│ ├── data/ # Database and raw data
│ ├── scripts/ # Inspection scripts
│ ├── pipelines/ # Data pipelines
│ ├── Makefile # Lab commands
│ ├── requirements.txt # Python dependencies
│ └── venv/ # Virtual environment
│
├── lab2_forecast_model/ # Lab 2 (template)
├── lab3.../ # Lab 3 (template)
│
├── setup_new_lab.sh # Bootstrap script
├── mkdocs.yml # Documentation config
├── .gitignore # Git configuration
└── README.md # This file
- Shared utilities (DataInspector, CSVMonitor)
- Central configuration
- Lab1 ready to use
- Makefile templates
- Bootstrap script for new labs
- Automated setup
- Orchestration (Airflow/Prefect)
- Advanced monitoring
- Data quality gates
- Multi-database support
Written once in shared/, used by all labs:
# DataInspector - inspect any database
from shared.utils.data_inspector import DataInspector
inspector = DataInspector('data/lab1_sales_performance.duckdb')
quality_score = inspector.get_quality_score('gold', 'summary')All labs registered in one place:
# config/labs_config.yaml
labs:
lab1_sales_performance:
path: lab1_sales_performance
db_path: lab1_sales_performance/data/lab1_sales_performance.duckdb
lab2_forecast_model:
path: lab2_forecast_model
db_path: lab2_forecast_model/data/lab2_forecast_model.duckdbEvery lab uses the same Makefile:
make setup # Works for any lab
make run # Works for any lab
make test # Works for any lab# 1. Create lab2
./setup_new_lab.sh lab2_forecast_model
# 2. Work on it
cd lab2_forecast_model
make run
make inspect
# 3. Create lab3
cd ..
./setup_new_lab.sh lab3_customer_segmentation
# 4. Run all labs at once (Airflow will do this)
for lab in lab1_sales_performance lab2_forecast_model lab3_customer_segmentation; do
cd $lab
make run
make test
cd ..
doneFull documentation is available at: https://buildcpg-labs.github.io
- Quick Start - 10-minute setup
- Installation - Detailed installation
- Architecture - System design
- Multi-Lab Design - How labs work together
- Phase 1 - Foundation setup
- Troubleshooting - Common issues
- FAQ - Frequently asked questions
- Python 3.11+ - Scripting and tooling
- dbt - Data transformation
- DuckDB - Embedded database (no server needed)
- MkDocs - Documentation
- Git - Version control
- Make - Command automation
| Requirement | Minimum | Recommended |
|---|---|---|
| Python | 3.11 | 3.12+ |
| OS | Mac 11+ / Linux | Mac 12+ / Ubuntu 20.04+ |
| Disk | 1GB | 10GB |
| RAM | 2GB | 4GB+ |
| Docker | Not needed ✅ | Not needed ✅ |
git clone https://github.com/narensham/buildcpg-labs.git
cd buildcpg-labs
pip install pyyaml duckdb pandas
python config/paths.pygit clone https://github.com/narensham/buildcpg-labs.git
cd buildcpg-labs
chmod +x setup.sh
./setup.shSee Installation Guide for detailed instructions.
- Quick Start - Setup and run in 10 minutes
- Architecture - Understand the design
- Create New Lab - Add your own lab
- Troubleshooting - Fix issues
Contributions welcome! See Contributing Guide for details.
MIT License - see LICENSE file for details.
Created by: narensham
Last Updated: January 2025
Repository: GitHub
Documentation: MkDocs