Welcome! This repository is the companion to the Orchestrating Workflows for GenAI Applications course on Deeplearning.AI. It contains all the Airflow 3 dags your build throughout the course and can be run locally using the Astro CLI.
Follow these steps to get spin up to run Airflow and Weaviate locally:
-
Fork and clone this repository: Click the "Fork" button on the top right of this page to create a copy of this repository in your GitHub account. Then clone it to your local machine and enter its directory:
git clone <your-forked-repo-url> cd <your-forked-repo-name>
-
Install the Astro CLI: Follow the installation instructions to install the Astro CLI on your local machine. If you are on a mac you can install using Homebrew:
brew install astro
If you already have the Astro CLI installed, make sure you are at least on version 1.34.1 by running:
astro --version
-
Create a .env file: Create a
.envfile in the root of your project directory. Copy the contents of the.env.examplefile into your new.envfile. This file contains the environment variable Airflow uses to connect to Weaviate. If you'd like to connect to your own, cloud-hosted Weaviate instance you can provide your own connection details here. Note that you have the option to enter an OpenAI API key, but this is not required for the course pipelines to run. -
Start the project by running the following command:
astro dev start
After running this command, the Astro CLI will start 5 containers running Airflow components (Scheduler, API Server, Triggerer, Dag Processor and the Postgres Metadata Database) either using Docker, if available on your machine, or using Podman, which the Astro CLI sets up for you. Alongside these containers, a local Weaviate instance, defined in the
docker-compose.override.ymlfile also starts. Once Airflow is ready it will open the Airflow UI in your browser athttp://localhost:8080/. You do not need any credentials to log in.Note: If you already have port
8080or port5432allocated, you can either stop your existing containers or change the port. -
Access the Airflow UI: Open your web browser and navigate to
http://localhost:8080/. You should see the Airflow UI where you can view and manage your dags. -
Run the example dags: In the Airflow UI, click on the "Dags" button to see the dag list. Unpause the
query_dataandfetch_datadag to run the example. -
Experiment!: Modify the dags in the
dagsfolder to add your own tasks, change the existing ones, or create new dags. You can also add new files to theincludefolder and reference them in your dags. -
Deploy your dags to Astronomer: You can start a free trial and follow the sign up flow to create your first deployment. Once you have a deployment, you can push your code to Astronomer by running:
astro login astro deploy
For other deployment options, refer to the Astronomer documentation.
This repository contains the following files and folders:
-
.astro: This folder contains advanced configuration files. You don't need to change anything here. -
dags: This folder contains the Python files for your Airflow dags. By default, this directory includes one example dag:
-
genai_dags
fetch_data.py: This dag extracts book descriptions from the files stored in theinclude/datafolder, creates vector embeddings for them usingfastembedand stores those embeddings in Weaviate. It also creates a Weaviate class calledBookto store the book descriptions, if it does not already exist.
query_data.py: This dag queries the Weaviate instance for book descriptions based on a user-provided query. It uses thefastembedpackage to create an embedding for the query and retrieves the most similar book descriptions from Weaviate.
-
practice_dags
my_first_dag.py: A very simple dag consisting of 3 tasks.my_second_dag.py: A slightly more complex dag consisting of 4 tasks.simple_dynamic_task_mapping.py: A simple dag that shows how to use dynamic task mapping.
-
.airflowignore: This file specifies which files and folders should be ignored by Airflow when scanning for dags. You can add any files or folders you want to ignore here.
-
-
include: This folder contains any additional files that you want to include as part of your project.
- data: This folder contains book description files. You can add your own descriptions either in the existing files or in a new file if you want to query your own favorite books. Make sure to follow the same format as the existing files:
<integer index> ::: <title> (<release year>) ::: <author> ::: <description>.
- data: This folder contains book description files. You can add your own descriptions either in the existing files or in a new file if you want to query your own favorite books. Make sure to follow the same format as the existing files:
-
plugins: Add custom Airflow plugins for your project to this file.
-
src/img: This folder contains images used in the readme.
-
tests: This folder contains unit tests and dag validation tests for your dags. You can add your own tests here to ensure your dags work as expected. All tests can be run using the command
astro dev pytestfrom the root of your project directory. Note that you can use any testing framework you like, not just pytest. Some example dag validation tests are included in thetests/test_dag_example.pyfile. -
.dockerignore: This file specifies which files and folders should be ignored by Docker when building the image. You can add any files or folders you want to ignore here. -
.env.example: This file contains the environment variable defining the connection between Airflow and Weaviate. You can copy this file to create your own.envfile. You can add additional environment variables here if needed and retrieve them from within dags usingos.getenv(). -
.gitignore: This file specifies which files and folders should be ignored by Git when committing changes. You can add any files or folders you want to ignore here. -
docker-compose.override.yml: This file contains the configuration for the local Weaviate instance that runs alongside Airflow. You can modify it to change the Weaviate version or add additional Weaviate configuration. -
Dockerfile: This file contains a versioned Astro Runtime Docker image that includes open source Airflow and some additional packages. If you want to execute other commands or overrides at runtime, specify them here. -
packages.txt: Install OS-level packages needed for your project by adding them to this file. -
README.md: This document. -
requirements.txt: Install Python packages needed for your project by adding them to this file. For this projectfastembedand the Airflow Weaviate Provider are included. If you'd like to use other Python packages in your task code, add them here. -
plugins: Add custom Airflow plugins for your project to this file.
- Course: Orchestrating Workflows for GenAI Applications
- Airflow Documentation
- Airflow guides and tutorials
- Astro free trial: Sign up for a free trial of Astro to deploy your dags to Astronomer.
- Lesson 2:
- Lesson 3:
- Introduction to the TaskFlow API and Airflow decorators: Learn more about decorators generally in Python and specifically in Airflow.
- Manage task and task group dependencies in Airflow: Learn more about setting dependencies between tasks using the
chainfunction and other methods. - Airflow Operators: Learn more about operator classes which can be used alongside
@taskto create Airflow tasks.
- Lesson 4:
- Connections & Hooks in the Airflow documentation
- Airflow Weaviate Provider Package: Documentation of the Airflow Weaviate Provider Package which includes the
WeaviateHook. - Airflow Hooks: Learn about Airflow hooks like the
WeaviateHook. - Manage connections in Apache Airflow: Learn about the different ways to connect Airflow to other tools.
- Strategies for custom XCom backends in Airflow: Learn how to save data that is passed between tasks in different storage systems.
- Lesson 5:
- Schedule DAGs in Apache Airflow®: Learn all the different ways of scheduling Airflow dags.
- DAG-level parameters in Airflow: A comprehensive list of dag parameters in Airflow.
- Assets and data-aware scheduling in Airflow: Learn how to created advanced data-aware schedules using
Assets in Airflow. - Access the Apache Airflow context: Learn how to interact with the Airflow context dictionary retrieved with
**context. - Create and use params in Airflow: Learn how to create advanced
paramsdictionaries for your Airflow dags. - Airflow REST API - Create Asset Event: You can update Assets from outside of Airflow using the Airflow REST API.
- Lesson 6:
- Create dynamic Airflow tasks: Learn all about dynamic task mapping in Airflow.
- Tip: you can limit the number of concurrently running mapped task instances using the task-level parameters
max_active_tis_per_dagandmax_active_tis_per_dagrun. - Airflow configuration reference - AIRFLOW__CORE__MAX_MAP_LENGTH: By default you can have up to 1024 dynamically mapped instances per task. Use this configuration environment variable to modify that limit.
- Lesson 7:
- Airflow trigger rules: A reference of all available trigger rules.
- Manage Apache Airflow® DAG notifications: Learn about different ways to let Airflow notify you of task and dag states, including notifier classes.
- Airflow Apprise provider: Documentation for the Airflow Apprise provider that integrates with many notification tools.

