diff --git a/docs/cloud/metrics/index.mdx b/docs/cloud/metrics/index.mdx index 1c82ffd42e..2eb5600d89 100644 --- a/docs/cloud/metrics/index.mdx +++ b/docs/cloud/metrics/index.mdx @@ -45,3 +45,5 @@ Cloud Metrics for all Namespaces in your account are available from two sources: OpenMetrics is the recommended option for most users. ::: + +For setting up SDK metrics emitted by your Workers and Clients, see [SDK metrics setup](/cloud/metrics/sdk-metrics-setup). diff --git a/docs/cloud/metrics/prometheus-grafana.mdx b/docs/cloud/metrics/prometheus-grafana.mdx index efa7a487ab..7eda24a9d9 100644 --- a/docs/cloud/metrics/prometheus-grafana.mdx +++ b/docs/cloud/metrics/prometheus-grafana.mdx @@ -2,25 +2,21 @@ id: prometheus-grafana title: Prometheus Grafana setup sidebar_label: Prometheus Grafana -description: Set up Grafana with Temporal Cloud observability to monitor performance and troubleshoot errors. Use Prometheus API endpoints and SDK metrics for efficient, real-time insights. +description: Set up Grafana with Temporal Cloud observability to monitor performance and troubleshoot errors using the Prometheus HTTP API endpoint. slug: /cloud/metrics/prometheus-grafana toc_max_heading_level: 4 keywords: - grafana temporal integration - temporal cloud observability - prometheus temporal cloud - - temporal sdk metrics - grafana prometheus setup - temporal cloud grafana dashboard - - prometheus scrape endpoint - grafana data source setup - - temporal sdk monitoring - grafana monitoring workflows - prometheus metrics visualization - observability with grafana - workflow metrics grafana - grafana prometheus integration - - temporal sdk metrics setup tags: - Metrics - Observability @@ -29,23 +25,21 @@ tags: import { ZoomingImage } from '@site/src/components'; -**How to set up Grafana with Temporal Cloud observability to view metrics.** - -Temporal Cloud and SDKs generate metrics for monitoring performance and troubleshooting errors. +**How to set up Grafana with Temporal Cloud PromQL endpoint to view Cloud metrics.** Temporal Cloud emits metrics through a [Prometheus HTTP API endpoint](https://prometheus.io/docs/prometheus/latest/querying/api/), which can be directly used as a Prometheus data source in Grafana or to query and export Cloud metrics to any observability platform. -The open-source SDKs require you to set up a Prometheus scrape endpoint for Prometheus to collect and aggregate the Worker and Client metrics. +:::note + +For setting up SDK metrics (emitted by your Workers and Clients), see [SDK metrics setup](/cloud/metrics/sdk-metrics-setup). -This section describes how to set up your Temporal Cloud and SDK metrics and use them as data sources in Grafana. +::: -The process for setting up observability includes the following steps: +The process for setting up Temporal Cloud PromQL to work with Grafana includes the following steps: -1. Create or get your Prometheus endpoint for Temporal Cloud metrics and enable SDK metrics. - - For Temporal Cloud, [generate a Prometheus HTTP API endpoint](/cloud/metrics/general-setup) on Temporal Cloud using valid certificates. - - For SDKs, [expose a metrics endpoint](#sdk-metrics-setup) where Prometheus can scrape SDK metrics and [run Prometheus](#prometheus-configuration) on your host. The examples in this article describe running Prometheus on your local machine where you run your application code. -2. Run Grafana and [set up data sources for Temporal Cloud and SDK metrics](#grafana-data-sources-configuration) in Grafana. The examples in this article describe running Grafana on your local host where you run your application code. -3. [Create dashboards](#grafana-dashboards-setup) in Grafana to view Temporal Cloud metrics and SDK metrics. Temporal provides [sample community-driven Grafana dashboards](https://github.com/temporalio/dashboards) for Cloud and SDK metrics that you can use and customize according to your requirements. +1. [Generate a Prometheus HTTP API endpoint](/cloud/metrics/general-setup) on Temporal Cloud using valid certificates. +2. Run Grafana and [set up a data source for Temporal Cloud metrics](#grafana-data-source-configuration) in Grafana. +3. [Create dashboards](#grafana-dashboards-setup) in Grafana to view Temporal Cloud metrics. Temporal provides [sample community-driven Grafana dashboards](https://github.com/temporalio/dashboards) for Cloud metrics that you can use and customize according to your requirements. If you're following through with the examples provided here, ensure that you have the following: @@ -59,7 +53,7 @@ If you're following through with the examples provided here, ensure that you hav - [TypeScript](/develop/typescript/core-application#connect-to-temporal-cloud) - [.NET](/develop/dotnet/temporal-client#connect-to-temporal-cloud) -- Prometheus and Grafana installed. +- Grafana installed. ## Temporal Cloud metrics setup @@ -84,172 +78,16 @@ The following steps describe how to set up Observability on Temporal Cloud to ge 6. Copy the HTTP API endpoint that is generated (it is shown in the UI). This endpoint should be configured as a data source for Temporal Cloud metrics in Grafana. -See [Data sources configuration for Temporal Cloud and SDK metrics in Grafana](#grafana-data-sources-configuration) for details. - -## SDK metrics setup - -SDK metrics are emitted by SDK Clients used to start your Workers and to start, signal, or query your Workflow Executions. -You must configure a Prometheus scrape endpoint for Prometheus to collect and aggregate your SDK metrics. -Each language development guide has details on how to set this up. - -- [Go SDK](/develop/go/observability#metrics) -- [Java SDK](/develop/java/observability#metrics) -- [TypeScript SDK](/develop/typescript/observability#metrics) -- [Python](/develop/python/observability#metrics) -- [.NET](/develop/dotnet/observability#metrics) - -The following example uses the Java SDK to set the Prometheus registry and Micrometer stats reporter, set the scope, and expose an endpoint from which Prometheus can scrape the SDK metrics. - -```java -//You need the following packages to set up metrics in Java. -//See the Developer's guide for packages required for other SDKs. - -//… -import com.sun.net.httpserver.HttpServer; -import com.uber.m3.tally.RootScopeBuilder; -import com.uber.m3.tally.Scope; -import com.uber.m3.util.Duration; -import com.uber.m3.util.ImmutableMap; - -import io.micrometer.prometheus.PrometheusConfig; -import io.micrometer.prometheus.PrometheusMeterRegistry; -import io.temporal.common.reporter.MicrometerClientStatsReporter; - -import java.io.IOException; -import java.io.OutputStream; -import java.net.InetSocketAddress; - -import io.temporal.serviceclient.SimpleSslContextBuilder; -import io.temporal.serviceclient.WorkflowServiceStubs; -import io.temporal.serviceclient.WorkflowServiceStubsOptions; - -import java.io.FileInputStream; -import java.io.InputStream; -//… - { - // See the Micrometer documentation for configuration details on other supported monitoring systems. - // Set up the Prometheus registry. - PrometheusMeterRegistry yourRegistry = new PrometheusMeterRegistry(PrometheusConfig.DEFAULT); - - public static Scope yourScope(){ - //Set up a scope, report every 10 seconds - Scope yourScope = new RootScopeBuilder() - .tags(ImmutableMap.of( - "customtag1", - "customvalue1", - "customtag2", - "customvalue2")) - .reporter(new MicrometerClientStatsReporter(yourRegistry)) - .reportEvery(Duration.ofSeconds(10)); - - //Start Prometheus scrape endpoint at port 8077 on your local host - HttpServer scrapeEndpoint = startPrometheusScrapeEndpoint(yourRegistry, 8077); - return yourScope; - } - - /** - * Starts HttpServer to expose a scrape endpoint. See - * https://micrometer.io/docs/registry/prometheus for more info. - */ - - public static HttpServer startPrometheusScrapeEndpoint( - PrometheusMeterRegistry yourRegistry, int port) { - try { - HttpServer server = HttpServer.create(new InetSocketAddress(port), 0); - server.createContext( - "/metrics", - httpExchange -> { - String response = registry.scrape(); - httpExchange.sendResponseHeaders(200, response.getBytes(UTF_8).length); - try (OutputStream os = httpExchange.getResponseBody()) { - os.write(response.getBytes(UTF_8)); - } - }); - server.start(); - return server; - } catch (IOException e) { - throw new RuntimeException(e); - } - } -} - -//… - -// With your scrape endpoint configured, set the metrics scope in your Workflow service stub and -// use it to create a Client to start your Workers and Workflow Executions. - -//… -{ - //Create Workflow service stubs to connect to the Frontend Service. - WorkflowServiceStubs service = WorkflowServiceStubs.newServiceStubs( - WorkflowServiceStubsOptions.newBuilder() - .setMetricsScope(yourScope()) //set the metrics scope for the WorkflowServiceStubs - .build()); - - //Create a Workflow service client, which can be used to start, signal, and query Workflow Executions. - WorkflowClient yourClient = WorkflowClient.newInstance(service, - WorkflowClientOptions.newBuilder().build()); -} - -//… -``` - -To check whether your scrape endpoints are emitting metrics, run your code and go to [http://localhost:8077/metrics](http://localhost:8077/metrics) to verify that you see the SDK metrics. - -You can set up separate scrape endpoints in your Clients that you use to start your Workers and Workflow Executions. - -For more examples on setting metrics endpoints in other SDKs, see the metrics samples: - -- [Java SDK Samples](https://github.com/temporalio/samples-java/tree/main/core/src/main/java/io/temporal/samples/metrics) -- [Go SDK Samples](https://github.com/temporalio/samples-go/tree/main/metrics) - -## SDK metrics Prometheus Configuration {#prometheus-configuration} - -**How to configure Prometheus to ingest Temporal SDK metrics.** - -For Temporal SDKs, you must have Prometheus running and configured to listen on the scrape endpoints exposed in your application code. - -For this example, you can run Prometheus locally or as a Docker container. -In either case, ensure that you set the listen targets to the ports where you expose your scrape endpoints. -When you run Prometheus locally, set your target address to port 8077 in your Prometheus configuration YAML file. (We set the scrape endpoint to port 8077 in the [SDK metrics setup](#sdk-metrics-setup) example.) - -Example: - -```yaml -global: - scrape_interval: 10s # Set the scrape interval to every 10 seconds. Default is every 1 minute. -#... - -# Set your scrape configuration targets to the ports exposed on your endpoints in the SDK. -scrape_configs: - - job_name: 'temporalsdkmetrics' - metrics_path: /metrics - scheme: http - static_configs: - - targets: - # This is the scrape endpoint where Prometheus listens for SDK metrics. - - localhost:8077 - # You can have multiple targets here, provided they are set up in your application code. -``` - -See the [Prometheus documentation](https://prometheus.io/docs/introduction/first_steps/) for more details on how you can run Prometheus locally or using Docker. - -Note that Temporal Cloud exposes metrics through a [Prometheus HTTP API endpoint](https://prometheus.io/docs/prometheus/latest/querying/api/) (not a scrape endpoint) that can be configured as a data source in Grafana. -The Prometheus configuration described here is for scraping metrics data on endpoints for SDK metrics only. - -To check whether Prometheus is receiving metrics from your SDK target, go to [http://localhost:9090](http://localhost:9090) and navigate to **Status > Targets**. -The status of your target endpoint defined in your configuration appears here. - -## Grafana data sources configuration {#grafana-data-sources-configuration} +See [Grafana data source configuration](#grafana-data-source-configuration) for details. -**How to configure data sources for Temporal Cloud and SDK metrics in Grafana.** +## Grafana data source configuration {#grafana-data-source-configuration} + +**How to configure the Temporal Cloud metrics data source in Grafana.** Depending on how you use Grafana, you can either install and run it locally, run it as a Docker container, or log in to Grafana Cloud to set up your data sources. If you have installed and are running Grafana locally, go to [http://localhost:3000](http://localhost:3000) and sign in. -You must configure your Temporal Cloud and SDK metrics data sources separately in Grafana. - To add the Temporal Cloud Prometheus HTTP API endpoint that we generated in the [Temporal Cloud metrics setup](/cloud/metrics/general-setup) section, do the following: 1. Go to **Configuration > Data sources**. @@ -266,52 +104,27 @@ To add the Temporal Cloud Prometheus HTTP API endpoint that we generated in the If you see issues in setting this data source, verify your CA certificate chain and ensure that you are setting the correct certificates in your Temporal Cloud observability setup and in the TLS authentication in Grafana. -To add the SDK metrics Prometheus endpoint that we configured in the [SDK metrics setup](#sdk-metrics-setup) and [Prometheus configuration for SDK metrics](#prometheus-configuration) sections, do the following: - -1. Go to **Configuration > Data sources**. -2. Select **Add data source > Prometheus**. -3. Enter a name for your Temporal Cloud metrics data source, such as _Temporal SDK metrics_. -4. In the **HTTP** section, enter your Prometheus endpoint in the URL field. - If running Prometheus locally as described in the examples in this article, enter `http://localhost:9090`. -5. For this example, enable **Skip TLS Verify** in the **Auth** section. -6. Click **Save and test** to verify that the data source is working. - -If you see issues in setting this data source, check whether the endpoints set in your SDKs are showing metrics. -If you don't see your SDK metrics at the scrape endpoints defined, check whether your Workers and Workflow Executions are running. -If you see metrics on the scrape endpoints, but Prometheus shows your targets are down, then there is an issue with connecting to the targets set in your SDKs. -Verify your Prometheus configuration and restart Prometheus. - -If you're running Grafana as a container, you can set your SDK metrics Prometheus data source in your Grafana configuration. -See the example Grafana configuration described in the [Prometheus and Grafana setup for open-source Temporal Service](/self-hosted-guide/monitoring#grafana) article. - ### Grafana dashboards setup To set up dashboards in Grafana, you can use the UI or configure them directly in your Grafana deployment. :::tip -Temporal provides community-driven example dashboards for [Temporal Cloud](https://github.com/temporalio/dashboards/tree/master/cloud) and [Temporal SDKs](https://github.com/temporalio/dashboards/tree/master/sdk) that you can customize to meet your needs. +Temporal provides community-driven [example dashboards for Temporal Cloud](https://github.com/temporalio/dashboards/tree/master/cloud) that you can customize to meet your needs. ::: To import a dashboard in Grafana: 1. In the left-hand navigation bar, select **Dashboards** > **Import dashboard**. -2. You can either copy and paste the JSON from the [Temporal Cloud](https://github.com/temporalio/dashboards/tree/master/cloud) and [Temporal SDK](https://github.com/temporalio/dashboards/tree/master/sdk) sample dashboards, or import the JSON files into Grafana. +2. You can either copy and paste the JSON from the [Temporal Cloud sample dashboards](https://github.com/temporalio/dashboards/tree/master/cloud), or import the JSON files into Grafana. 3. Save the dashboard and review the metrics data in the graphs. To configure dashboards with the UI: 1. Go to **Create > Dashboard** and add an empty panel. -2. On the **Panel configuration** page, in the **Query** tab, select the "Temporal Cloud metrics" or "Temporal SDK metrics" data source that you configured earlier. - If you need to add multiple queries from both data sources, choose `–Mixed–`. -3. Add your metrics queries: - - For Temporal Cloud metrics, expand the **Metrics browser** and select the metrics you want. - You can also select associated labels and values to sort the query data. - The [Cloud metrics documentation](/cloud/metrics/reference) lists all metrics emitted from Temporal Cloud. - - For Temporal SDK metrics, expand the **Metrics browser** and select the metrics you want. - A list of Worker performance metrics is described in the [Developer's Guide - Worker performance](/develop/worker-performance). - All SDK-related metrics are listed in the [SDK metrics](/references/sdk-metrics) reference. +2. On the **Panel configuration** page, in the **Query** tab, select the "Temporal Cloud metrics" data source that you configured earlier. +3. Expand the **Metrics browser** and select the metrics you want. + You can also select associated labels and values to sort the query data. + The [PromQL documentation](/cloud/metrics/reference) lists all metrics emitted from PromQL in Temporal Cloud. 4. The graph should now display data based on your selected queries. - Note that SDK metrics will only show if you have Workflow Execution data and running Workers. - If you don't see SDK metrics, run your Worker and Workflow Executions, then monitor the dashboard. diff --git a/docs/cloud/metrics/sdk-metrics-setup.mdx b/docs/cloud/metrics/sdk-metrics-setup.mdx new file mode 100644 index 0000000000..b3119fba6d --- /dev/null +++ b/docs/cloud/metrics/sdk-metrics-setup.mdx @@ -0,0 +1,147 @@ +--- +id: sdk-metrics-setup +title: Monitor SDK metrics with Prometheus and Grafana +sidebar_label: SDK Metrics +description: Set up Temporal SDK metrics with Prometheus and Grafana for monitoring Workers and Client performance. +slug: /cloud/metrics/sdk-metrics-setup +toc_max_heading_level: 4 +keywords: + - temporal sdk metrics + - prometheus scrape endpoint + - sdk metrics setup + - temporal sdk monitoring + - grafana sdk metrics + - worker metrics + - temporal sdk prometheus + - sdk metrics dashboard +tags: + - Metrics + - Observability + - Temporal Cloud +--- + +import { ZoomingImage } from '@site/src/components'; + +SDK metrics are emitted by SDK Clients used to start your Workers and to start, signal, or query your Workflow Executions. +Unlike [Temporal Cloud metrics](/cloud/metrics/), which are exposed through a Prometheus HTTP API endpoint, SDK metrics require you to set up a Prometheus scrape endpoint in your application code for Prometheus to collect and aggregate. + +For a full list of available SDK metrics and their descriptions, see the [SDK metrics reference](/references/sdk-metrics). + +The process for setting up SDK metrics includes the following steps: + +1. [Expose a metrics endpoint](#sdk-metrics-setup) in your application code where Prometheus can scrape SDK metrics. +2. [Configure Prometheus](#prometheus-configuration) to scrape your SDK metrics endpoints. +3. [Add an SDK metrics data source](#grafana-data-source-configuration) in Grafana. +4. [Set up dashboards](#grafana-dashboards-setup) to visualize SDK metrics. + +Set up your connections to Temporal Cloud using an SDK of your choice and have some Workflows running on Temporal Cloud. +Ensure Prometheus and Grafana are installed. + +- [Go](/develop/go/temporal-client#connect-to-temporal-cloud) +- [Java](/develop/java/temporal-client#connect-to-temporal-cloud) +- [Python](/develop/python/temporal-client#connect-to-temporal-cloud) +- [TypeScript](/develop/typescript/core-application#connect-to-temporal-cloud) +- [.NET](/develop/dotnet/temporal-client#connect-to-temporal-cloud) + +## Expose a metrics endpoint {#sdk-metrics-setup} + +You must configure a Prometheus scrape endpoint for Prometheus to collect and aggregate your SDK metrics. +Each language development guide has details on how to set this up. + +- [Go SDK](/develop/go/observability#metrics) +- [Java SDK](/develop/java/observability#metrics) +- [TypeScript SDK](/develop/typescript/observability#metrics) +- [Python](/develop/python/observability#metrics) +- [.NET](/develop/dotnet/observability#metrics) + +For working examples of how to configure metrics in each SDK, see the metrics samples: + +- [Go SDK Samples](https://github.com/temporalio/samples-go/tree/main/metrics) +- [Java SDK Samples](https://github.com/temporalio/samples-java/tree/main/core/src/main/java/io/temporal/samples/metrics) +- [TypeScript SDK Samples](https://github.com/temporalio/samples-typescript/tree/main/interceptors-opentelemetry) +- [Python SDK Samples](https://github.com/temporalio/samples-python/tree/main/custom_metric) +- [.NET SDK Samples](https://github.com/temporalio/samples-dotnet/tree/main/src/OpenTelemetry/DotNetMetrics) + +Some examples use OpenTelemtry to instrument metrics. It is useful to use a +[Prometheus exporter with OpenTelemetry](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/prometheusexporter) to expose metrics for scraping. + +## Configure Prometheus {#prometheus-configuration} + +For Temporal SDKs, you must have Prometheus running and configured to listen on the scrape endpoints exposed in your application code. + +For this example, you can run Prometheus locally or as a Docker container. +In either case, ensure that you set the listen targets to the ports where you expose your scrape endpoints. +This configuration assumes the scrape endpoint is set to port 8077 as in the [SDK metrics setup](#sdk-metrics-setup) example. + +```yaml +global: + scrape_interval: 30s # Set the scrape interval to every 30 seconds. Default is every 1 minute. +#... + +# Set your scrape configuration targets to the ports exposed on your endpoints in the SDK. +scrape_configs: + - job_name: 'temporalsdkmetrics' + metrics_path: /metrics + scheme: http + static_configs: + - targets: + # This is the scrape endpoint where Prometheus listens for SDK metrics. + - localhost:8077 + # You can have multiple targets here, provided they are set up in your application code. +``` + +See the [Prometheus documentation](https://prometheus.io/docs/introduction/first_steps/) for more details on how you can run Prometheus locally or using Docker. + +To check whether Prometheus is receiving metrics from your SDK target, go to [http://localhost:9090](http://localhost:9090) and navigate to **Status > Targets**. +The status of your target endpoint defined in your configuration appears here. + +## Add an SDK metrics data source in Grafana {#grafana-data-source-configuration} + +Depending on how you use Grafana, you can either install and run it locally, run it as a Docker container, or log in to Grafana Cloud to set up your data sources. + +If you have installed and are running Grafana locally, go to [http://localhost:3000](http://localhost:3000) and sign in. + +To add the SDK metrics Prometheus endpoint as a data source, do the following: + +1. Go to **Configuration > Data sources**. +2. Select **Add data source > Prometheus**. +3. Enter a name for your SDK metrics data source, such as _Temporal SDK metrics_. +4. In the **HTTP** section, enter your Prometheus endpoint in the URL field. + If running Prometheus locally as described in the examples in this article, enter `http://localhost:9090`. +5. For this example, enable **Skip TLS Verify** in the **Auth** section. +6. Click **Save and test** to verify that the data source is working. + +If you see issues in setting this data source, check whether the endpoints set in your SDKs are showing metrics. +If you don't see your SDK metrics at the scrape endpoints defined, check whether your Workers and Workflow Executions are running. +If you see metrics on the scrape endpoints, but Prometheus shows your targets are down, then there is an issue with connecting to the targets set in your SDKs. +Verify your Prometheus configuration and restart Prometheus. + +If you're running Grafana as a container, you can set your SDK metrics Prometheus data source in your Grafana configuration. +See the example Grafana configuration described in the [Prometheus and Grafana setup for open-source Temporal Service](/self-hosted-guide/monitoring#grafana) article. + +## Set up Grafana dashboards {#grafana-dashboards-setup} + +To set up SDK metrics dashboards in Grafana, you can use the UI or configure them directly in your Grafana deployment. + +:::tip + +Temporal provides community-driven [example dashboards for Temporal SDKs](https://github.com/temporalio/dashboards/tree/master/sdk) that you can customize to meet your needs. + +::: + +To import a dashboard in Grafana: + +1. In the navigation bar, select **Dashboards** > **Import dashboard**. +2. You can either copy and paste the JSON from the [Temporal SDK sample dashboards](https://github.com/temporalio/dashboards/tree/master/sdk), or import the JSON files into Grafana. +3. Save the dashboard and review the metrics data in the graphs. + +To configure dashboards with the UI: + +1. Go to **Create > Dashboard** and add an empty panel. +2. On the **Panel configuration** page, in the **Query** tab, select the "Temporal SDK metrics" data source that you configured earlier. +3. Expand the **Metrics browser** and select the metrics you want. + A list of Worker performance metrics is described in the [Developer's Guide - Worker performance](/develop/worker-performance). + All SDK-related metrics are listed in the [SDK metrics](/references/sdk-metrics) reference. +4. The graph should now display data based on your selected queries. + Note that SDK metrics will only show if you have Workflow Execution data and running Workers. + If you don't see SDK metrics, run your Worker and Workflow Executions, then monitor the dashboard. diff --git a/sidebars.js b/sidebars.js index eba8256faf..4e601b9b3b 100644 --- a/sidebars.js +++ b/sidebars.js @@ -418,6 +418,7 @@ module.exports = { }, items: ['cloud/metrics/general-setup', 'cloud/metrics/reference', 'cloud/metrics/prometheus-grafana'], }, + 'cloud/metrics/sdk-metrics-setup', ], }, {