This project contains the following containers:
-
postgres: Postgres database for Airflow metadata and a Test database to test whatever you want.
- Image: postgres:9.6
- Database Port: 5432
- References: https://hub.docker.com/_/postgres
-
airflow-webserver: Airflow webserver and Scheduler
- Image: docker-airflow-pyspark:latest
- Port: 8080
-
jupyter-spark: Jupyter notebook with pyspark for interactive development
- Image: jupyter/pyspark-notebook
- Port: 8888
- References:
git clone https://github.com/cordon-thiago/airflow-pyspark
sudo docker pull postgres:9.6
sudo docker pull jupyter/pyspark-notebook:latest
Inside the <'project folder'>/docker/docker-airflow
sudo docker build --rm -t docker-airflow-pyspark .
Navigate to the <'project folder'>/docker and:
sudo docker-compose up
If you want to run in background:
sudo docker-compose up -d
Airflow: http://localhost:8080
PostgreSql - Database Test:
- Server: localhost:5432
- Database: test
- User: test
- Password: postgres
Postgres - Database airflow:
- Server: localhost:5432
- Database: airflow
- User: airflow
- Password: airflow
Jupyter Notebook: http://127.0.0.1:8888
-
For Jupyter notebook, you must copy the URL with the token generated when the container is started and paste in your browser. The URL with the token can be taken from container logs using:
docker logs -f docker_jupyter-spark_1
-
Access airflow web UI http://localhost:8080 and go to Connections
-
Edit the spark_default connection inserting localhost in Host field
-
Run the spark-test DAG
Rebuild Dockerfile:
sudo docker build --rm --build-arg AIRFLOW_DEPS="gcp" -t docker-airflow-pyspark .
After successfully built, run docker-compose to start container:
sudo docker-compose up
More info at: https://github.com/puckel/docker-airflow#build
Inside the project directory (airflow-pyspark):
docker exec -it docker_jupyter-spark_1 spark-submit --master local /home/jovyan/work/spark-scripts/hello-world.py
List Images:
sudo docker images <repository_name>
List Containers:
sudo docker container ls
Check container logs:
sudo docker logs -f <container_name>
To build a Dockerfile after changing sth (run inside directoty containing Dockerfile):
sudo docker build --rm -t <tag_name> .
Access container bash:
sudo docker exec -i -t <container_name> /bin/bash
Start Containers:
sudo docker-compose -f <compose-file.yml> up -d
Stop Containers:
sudo docker-compose -f <compose-file.yml> down --remove-orphans