|
| 1 | +--- |
| 2 | +title: Airflow Tutorial |
| 3 | +template: basepage |
| 4 | +sidebar_position: 2 |
| 5 | +--- |
| 6 | + |
| 7 | +import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; |
| 8 | + |
| 9 | +## Table of Contents |
| 10 | + |
| 11 | +1. [Prerequisites](#prerequisites) |
| 12 | +2. [Get and Start Marquez](#get-marquez) |
| 13 | +3. [Get Airflow](#get-airflow) |
| 14 | +4. [Collecting Live Metadata](#collecting-live-metadata) |
| 15 | +5. [Summary](#summary) |
| 16 | +6. [Next Steps](#next-steps) |
| 17 | +7. [Feedback](#feedback) |
| 18 | + |
| 19 | +# Prerequisites {#prerequisites} |
| 20 | + |
| 21 | +Before you begin, make sure you have installed: |
| 22 | + |
| 23 | +<Tabs groupId="prereqs"> |
| 24 | +<TabItem value="macos" label="MacOS/Linux"> |
| 25 | + |
| 26 | +* [Docker 17.05](https://docs.docker.com/install)+ |
| 27 | +* [Docker Compose](https://docs.docker.com/compose/install) |
| 28 | +* [Airflow 2.8+](https://airflow.apache.org/docs/apache-airflow/stable/start.html) |
| 29 | + |
| 30 | +</TabItem> |
| 31 | +<TabItem value="windows" label="Windows"> |
| 32 | + |
| 33 | +* [Git Bash](https://gitforwindows.org/) |
| 34 | +* [PostgreSQL 14](https://www.postgresql.org/) |
| 35 | +* [Docker 17.05](https://docs.docker.com/install)+ |
| 36 | +* [Docker Compose](https://docs.docker.com/compose/install) |
| 37 | +* [Airflow 2.8+](https://airflow.apache.org/docs/apache-airflow/stable/start.html) |
| 38 | + |
| 39 | +</TabItem> |
| 40 | +</Tabs> |
| 41 | + |
| 42 | +## Get and Start Marquez |
| 43 | + |
| 44 | +To checkout the Marquez source code, run: |
| 45 | + |
| 46 | +<Tabs groupId="get"> |
| 47 | +<TabItem value="macos" label="MacOS/Linux"> |
| 48 | + |
| 49 | +```bash |
| 50 | +$ git clone https://github.com/MarquezProject/marquez && cd marquez |
| 51 | +``` |
| 52 | + |
| 53 | +</TabItem> |
| 54 | +<TabItem value="windows" label="Windows"> |
| 55 | + |
| 56 | +```bash |
| 57 | +$ git config --global core.autocrlf false |
| 58 | +$ git clone https://github.com/MarquezProject/marquez && cd marquez |
| 59 | +``` |
| 60 | + |
| 61 | +</TabItem> |
| 62 | +</Tabs> |
| 63 | + |
| 64 | +Both Airflow and Marquez require port 5432 for their metastores, but the Marquez services are much easier to configure on the fly. So start Marquez with an alternate port supplied to the `db-port` parameter: |
| 65 | + |
| 66 | +<Tabs groupId="start"> |
| 67 | +<TabItem value="macos" label="MacOS/Linux"> |
| 68 | + |
| 69 | +```bash |
| 70 | +$ ./docker/up.sh --db-port 2345 |
| 71 | +``` |
| 72 | + |
| 73 | +</TabItem> |
| 74 | +<TabItem value="windows" label="Windows"> |
| 75 | + |
| 76 | +Verify that Postgres and Bash are in your `PATH`, then run: |
| 77 | + |
| 78 | +```bash |
| 79 | +$ sh ./docker/up.sh --db-port 2345 |
| 80 | +``` |
| 81 | + |
| 82 | +</TabItem> |
| 83 | +</Tabs> |
| 84 | + |
| 85 | +To view the Marquez UI and verify it's running, open [http://localhost:3000](http://localhost:3000). The UI enables you to discover dependencies between jobs and the datasets they produce and consume via the lineage graph, view run-level metadata of current and previous job runs, and much more! |
| 86 | + |
| 87 | +## Configure Airflow to send events to Marquez |
| 88 | + |
| 89 | +To configure Airflow to emit OpenLineage events to Marquez, you need to define an OpenLineage transport. This is easy to do with an environment variable. Run: |
| 90 | + |
| 91 | +```bash |
| 92 | +$ export AIRFLOW__OPENLINEAGE__TRANSPORT='{"type": "http", "url": "http://localhost:5000", "endpoint": "api/v1/lineage"}' |
| 93 | +``` |
| 94 | + |
| 95 | +To add the required Airflow OpenLineage Provider package to your Airflow environment, run: |
| 96 | + |
| 97 | +```bash |
| 98 | +pip install apache-airflow-providers-openlineage |
| 99 | +``` |
| 100 | + |
| 101 | +Run a dag in Airflow. To verify that the OpenLineage Provider is configured correctly, check the task logs for an `INFO`-level log reporting the transport type you defined: |
| 102 | + |
| 103 | + |
| 104 | + |
| 105 | +## View Airflow lineage in Marquez |
| 106 | + |
| 107 | + |
| 108 | + |
| 109 | + |
| 110 | + |
| 111 | + |
| 112 | + |
| 113 | + |
| 114 | + |
| 115 | +## Next Steps {#next-steps} |
| 116 | + |
| 117 | + |
| 118 | + |
| 119 | +## Feedback {#feedback} |
| 120 | + |
| 121 | +What did you think of this guide? You can reach out to us on [slack](https://join.slack.com/t/marquezproject/shared_invite/zt-2iylxasbq-GG_zXNcJdNrhC9uUMr3B7A) and leave us feedback, or [open a pull request](https://github.com/MarquezProject/marquez/blob/main/CONTRIBUTING.md#submitting-a-pull-request) with your suggestions! |
| 122 | + |
0 commit comments