Automated test suite that runs all notebooks in Docker and checks for errors and empty SELECT queries.
Important: The test suite automatically runs notebooks in the correct order:
- Setup.ipynb runs FIRST (prerequisite - downloads data and initializes environment)
- All other notebooks run independently after Setup completes
Each video notebook (Module1, Module2, Module3) can run completely independently after Setup.
# Install dependencies
pip install -r requirements-test.txt
# Run all notebooks tests (Setup runs first automatically)
pytest test_notebooks.py -v# Test a specific notebook
pytest test_notebooks.py -k "Module1"
# Run in parallel (faster)
pytest test_notebooks.py -n auto
# Run as standalone Python script
python test_notebooks.py- ✅ All cells execute without errors
- ✅ SELECT queries return data rows (not empty)
- ✅ No Python exceptions or broken cells
The offline JAR mode allows Spark to use pre-downloaded JARs instead of fetching them from Maven Central at runtime. This is tested by a dedicated CI workflow (.github/workflows/test-offline-jars.yml) and can also be run locally.
# 1. Download JARs (use --insecure if behind a corporate proxy)
./manual-download-dependencies.sh
# 2. Start services (JARs are auto-mounted from ./jars/)
docker compose up -d
# 3. Verify the config uses local JARs
docker exec jupyter-spark grep "^spark.jars=" /home/jovyan/.sparkconf/spark-defaults.conf
# 4. Run a notebook test to confirm everything works
pytest test_notebooks.py -v -k "test_01_setup or E1_2_DataModeling" --tb=short
# 5. Clean up
docker compose down -v
rm -rf jars/manual-download-dependencies.shdownloads all expected JARs- The generated
spark-defaults.confusesspark.jars=(local paths) instead ofspark.jars.packages= - A notebook runs successfully with Spark loading from local JARs
- Docker running with
jupyter-sparkcontainer - Python packages: pytest, nbconvert, nbformat (installed via requirements-test.txt)