-
Notifications
You must be signed in to change notification settings - Fork 381
Description
Context
Since the 0.7.0 release (#1906), Marquez supports pushing metrics to Prometheus.
This task proposes extending the current capability to give visibility to Marquez's SQL queries. Some of the questions we'd like to be answered:
- What queries is Marquez running?
- How long does each query take?
- How many times does a specific query run?
By identifying potential bottlenecks in Marquez queries and the database, this extension could facilitate the provisioning of adequate resources. This, in turn, could lead to improved performance and efficiency of the database and Marquez itself.
Implementation
If possible, we could give visibility of frequency (count) and duration (gauge) for all queries Marquez runs. There is a possibility this could be done close to jdbi: https://metrics.dropwizard.io/4.2.0/manual/jdbi.html
If this is not possible, we could add the instrumentation to specific write and read endpoints, covering at least the SQL queries triggered by the following endpoints:
- POST
api/v1/lineage(*) - GET
api/v1/namespaces/{namespace}(*) - GET
api/v1/namespaces - GET
api/v1/namespaces/{namespace}/jobs/{job} - GET
api/v1/namespaces/{namespace}/datasets - GET
api/v1/column-lineage
The most critical are (*)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status