This repository contains a docker-compose to get up and running a local Airflow environment for developing purposes. Don't use this in production environments.
Start the environment with the docker-compose up command.
Be aware that in the Dockerfile, it is not specified the Airflow Docker image, so the latest version will be installed. It is recommended to specify the same version you have in your production environment.
Add your dags in the dags folder.
In file secrets/variables.yaml you can add variables to be accessed in your DAGs.
In file secrets/connections.yaml you can add connections to be accessed in your DAGs.
When the environment is ready, a user and password authentication will be prompted. By default, use these credentials:
- user: airflow
- pass: airflow
The DAG validator is a python file that run pytest to identify unexpected orphaned airflow tasks with no/unexpected upstream and downstream dependencies. This ensures the correctness of the airflow DAGs such that the orchestrated data pipeline behaves as expected. The validation process leverages airflow DAG dependencies, expressed as relationships like task1 >> task2, to identify unexpected orphaned tasks with no/unexpected upstream and downstream dependencies.
To execute the DAG validation tests, use the following command:
pytest -s tests/test_dag_output.py
The pytest suite comprises two individual tests, each addressing specific aspects of DAG validation:
- Find Orphaned Models with No Upstream Tasks (All layers: i.e. staging_to_source, landing-to-staging)
- Find Incorrectly Mapped Upstream Layers (All layers: i.e. staging_to_source, landing-to-staging)
You can modify or add tests as needed to enhance the DAG validation process.
Before you begin, ensure you have met the following requirements:
- Python: This project requires Python. If you haven't installed it yet, you can download it from python.org or use your system's package manager.
You can install the project dependencies by running the following command:
pip install -r requirements.txt