Skip to content

Expose HTTP endpoint SQL queries, queries count and execution time via Prometheus #2828

@tatiana

Description

@tatiana

Context

Since the 0.7.0 release (#1906), Marquez supports pushing metrics to Prometheus.

This task proposes extending the current capability to give visibility to Marquez's SQL queries. Some of the questions we'd like to be answered:

  • What queries is Marquez running?
  • How long does each query take?
  • How many times does a specific query run?

By identifying potential bottlenecks in Marquez queries and the database, this extension could facilitate the provisioning of adequate resources. This, in turn, could lead to improved performance and efficiency of the database and Marquez itself.

Implementation

If possible, we could give visibility of frequency (count) and duration (gauge) for all queries Marquez runs. There is a possibility this could be done close to jdbi: https://metrics.dropwizard.io/4.2.0/manual/jdbi.html

If this is not possible, we could add the instrumentation to specific write and read endpoints, covering at least the SQL queries triggered by the following endpoints:

  • POST api/v1/lineage (*)
  • GET api/v1/namespaces/{namespace} (*)
  • GET api/v1/namespaces
  • GET api/v1/namespaces/{namespace}/jobs/{job}
  • GET api/v1/namespaces/{namespace}/datasets
  • GET api/v1/column-lineage

The most critical are (*)

Metadata

Metadata

Assignees

No one assigned

    Labels

    db.perfThis issue or pull request improves DB performance

    Type

    No type

    Projects

    Status

    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions