Skip to content

Commit 47f8590

Browse files
committed
Airflow tutorial stub with screenshots.
Signed-off-by: merobi-hub <[email protected]>
1 parent 7fb3d5e commit 47f8590

File tree

6 files changed

+122
-0
lines changed

6 files changed

+122
-0
lines changed
113 KB
Loading
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
---
2+
title: Airflow Tutorial
3+
template: basepage
4+
sidebar_position: 2
5+
---
6+
7+
import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';
8+
9+
## Table of Contents
10+
11+
1. [Prerequisites](#prerequisites)
12+
2. [Get and Start Marquez](#get-marquez)
13+
3. [Get Airflow](#get-airflow)
14+
4. [Collecting Live Metadata](#collecting-live-metadata)
15+
5. [Summary](#summary)
16+
6. [Next Steps](#next-steps)
17+
7. [Feedback](#feedback)
18+
19+
# Prerequisites {#prerequisites}
20+
21+
Before you begin, make sure you have installed:
22+
23+
<Tabs groupId="prereqs">
24+
<TabItem value="macos" label="MacOS/Linux">
25+
26+
* [Docker 17.05](https://docs.docker.com/install)+
27+
* [Docker Compose](https://docs.docker.com/compose/install)
28+
* [Airflow 2.8+](https://airflow.apache.org/docs/apache-airflow/stable/start.html)
29+
30+
</TabItem>
31+
<TabItem value="windows" label="Windows">
32+
33+
* [Git Bash](https://gitforwindows.org/)
34+
* [PostgreSQL 14](https://www.postgresql.org/)
35+
* [Docker 17.05](https://docs.docker.com/install)+
36+
* [Docker Compose](https://docs.docker.com/compose/install)
37+
* [Airflow 2.8+](https://airflow.apache.org/docs/apache-airflow/stable/start.html)
38+
39+
</TabItem>
40+
</Tabs>
41+
42+
## Get and Start Marquez
43+
44+
To checkout the Marquez source code, run:
45+
46+
<Tabs groupId="get">
47+
<TabItem value="macos" label="MacOS/Linux">
48+
49+
```bash
50+
$ git clone https://github.com/MarquezProject/marquez && cd marquez
51+
```
52+
53+
</TabItem>
54+
<TabItem value="windows" label="Windows">
55+
56+
```bash
57+
$ git config --global core.autocrlf false
58+
$ git clone https://github.com/MarquezProject/marquez && cd marquez
59+
```
60+
61+
</TabItem>
62+
</Tabs>
63+
64+
Both Airflow and Marquez require port 5432 for their metastores, but the Marquez services are much easier to configure on the fly. So start Marquez with an alternate port supplied to the `db-port` parameter:
65+
66+
<Tabs groupId="start">
67+
<TabItem value="macos" label="MacOS/Linux">
68+
69+
```bash
70+
$ ./docker/up.sh --db-port 2345
71+
```
72+
73+
</TabItem>
74+
<TabItem value="windows" label="Windows">
75+
76+
Verify that Postgres and Bash are in your `PATH`, then run:
77+
78+
```bash
79+
$ sh ./docker/up.sh --db-port 2345
80+
```
81+
82+
</TabItem>
83+
</Tabs>
84+
85+
To view the Marquez UI and verify it's running, open [http://localhost:3000](http://localhost:3000). The UI enables you to discover dependencies between jobs and the datasets they produce and consume via the lineage graph, view run-level metadata of current and previous job runs, and much more!
86+
87+
## Configure Airflow to send events to Marquez
88+
89+
To configure Airflow to emit OpenLineage events to Marquez, you need to define an OpenLineage transport. This is easy to do with an environment variable. Run:
90+
91+
```bash
92+
$ export AIRFLOW__OPENLINEAGE__TRANSPORT='{"type": "http", "url": "http://localhost:5000", "endpoint": "api/v1/lineage"}'
93+
```
94+
95+
To add the required Airflow OpenLineage Provider package to your Airflow environment, run:
96+
97+
```bash
98+
pip install apache-airflow-providers-openlineage
99+
```
100+
101+
Run a dag in Airflow. To verify that the OpenLineage Provider is configured correctly, check the task logs for an `INFO`-level log reporting the transport type you defined:
102+
103+
![](airflow_task_logs.png)
104+
105+
## View Airflow lineage in Marquez
106+
107+
![](marquez_dataops.png)
108+
109+
![](marquez_jobs_drawer.png)
110+
111+
![](marquez_jobs_view.png)
112+
113+
![](marquez_events.png)
114+
115+
## Next Steps {#next-steps}
116+
117+
118+
119+
## Feedback {#feedback}
120+
121+
What did you think of this guide? You can reach out to us on [slack](https://join.slack.com/t/marquezproject/shared_invite/zt-2iylxasbq-GG_zXNcJdNrhC9uUMr3B7A) and leave us feedback, or [open a pull request](https://github.com/MarquezProject/marquez/blob/main/CONTRIBUTING.md#submitting-a-pull-request) with your suggestions!
122+
73.7 KB
Loading
55.9 KB
Loading
92.7 KB
Loading
48.5 KB
Loading

0 commit comments

Comments
 (0)