You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Both Airflow and Marquez require port 5432 for their metastores, but the Marquez services are much easier to configure, even on the fly. So start Marquez with an alternate port:
64
+
Both Airflow and Marquez require port 5432 for their metastores, but the Marquez services are easier to configure. You can also assign the database service to a new port on the fly. To start Marquez using port 2345 for the database, run:
65
65
66
66
<TabsgroupId="start">
67
67
<TabItemvalue="macos"label="MacOS/Linux">
@@ -82,42 +82,47 @@ $ sh ./docker/up.sh --db-port 2345
82
82
</TabItem>
83
83
</Tabs>
84
84
85
-
To view the Marquez UI and verify it's running, open [http://localhost:3000](http://localhost:3000). The UI allow you to discover dependencies between jobs and the datasets they produce and consume via the lineage graph, view run-level metadata of current and previous job runs, and get a high-level view of current and historical operations.
85
+
To view the Marquez UI and verify it's running, open [http://localhost:3000](http://localhost:3000). The UI allows you to:
86
+
- cross-platform dependencies, meaning you can see the jobs across the tools in your ecosystem that produce or consume a critical table.
87
+
- view run-level metadata of current and previous job runs, enabling you to see the latest status of a job and the update history of a dataset.
88
+
- get a high-level view of resource usage, allowing you to see trends in your operations.
86
89
87
90
## Configure Airflow to send events to Marquez {#configure-airflow}
88
91
89
-
1. To configure Airflow to emit OpenLineage events to Marquez, you need to define an OpenLineage transport and namespace. This is easy to do using environment variables. Run:
92
+
1. To configure Airflow to emit OpenLineage events to Marquez, you need to define an OpenLineage transport. One way you can do this is by using an environment variable. To use `http` and send events to the Marquez API running locally on port `5000`, run:
7. Run your DAG. To verify that the OpenLineage Provider is configured correctly, check the task logs for an `INFO`-level log reporting the transport type you defined: `OpenLineageClient will use http transport`.
228
+
This DAG is scheduled on the first one using an Airflow Dataset, so it will run automatically when `Flaky DAG` completes a run.
229
+
230
+
8. Run your DAGs by triggering the `Flaky DAG`. To verify that the OpenLineage Provider is configured correctly, check the task logs for an `INFO`-level log reporting the transport type you defined. In this case, the log will say: `OpenLineageClient will use http transport`.
180
231
181
232
## View Airflow operational analytics and data lineage in Marquez {#view-airflow}
182
233
183
-
The DataOps view offers a high-level view of historical and in-process operations, including task-level run status and runtime information at a glance:
234
+
The DataOps view offers a high-level view of historical and in-process operations, including task-level run status and runtime information:
184
235
185
236

186
237
187
238
### Datasets lineage graph
188
239
189
-
In the Datasets view, you can click on a dataset to inspect a cross-platfrom-capable lineage graph. In this case, you can view the upstream tasks feeding the `airflowsample` table in Airflow:
240
+
In the Datasets view, click on the dataset to get a cross-platfrom-capable lineage graph. In this case, you will be able to see the upstream tasks across the two DAGs in your environment that feed the `airflowsample` table in Airflow:
190
241
191
-

242
+

192
243
193
244
:::info
194
245
195
-
Dependencies in other platforms that modify or consume the same dataset will also appear in the graph.
246
+
Dependencies in other platforms that modify or consume the same dataset will also appear in this graph.
196
247
197
248
:::
198
249
199
-
### Dataset details
250
+
### Leveraging the Marquez graph
200
251
201
-
Click on the dataset node for a more details, including the schema, the time of the most recent update, and any facets in the OpenLineage event payload:
252
+
If the `airflowsample` table were to get stale (imagine that), you would need to know about all the upstream dependencies in order to diagnose and resolve the issue efficiently.
202
253
203
-

254
+
In the graph, you can click on an upstream job node to see information including:
255
+
- the latest run status.
256
+
- the last runtime.
257
+
- the time last started.
258
+
- the time last finished.
204
259
205
-
### Versioned schema history
260
+

206
261
207
-
Click on the versions tab in the drawer for a versioned schema history:
262
+
You can also access a versioned table schema history from the Marquez graph, so you can see at a glance if data quality in a critical table has become compromised and when a loss occurred:
0 commit comments