Skip to content

Commit 5c425bd

Browse files
update docs re: metrics
1 parent 33bd26f commit 5c425bd

2 files changed

Lines changed: 69 additions & 17 deletions

File tree

docs/metrics.md

Lines changed: 66 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,12 @@ compatible with any .NET metrics listener -- most notably the
77
The SDK also includes a built-in Prometheus HTTP listener via `MetricsCollector.StartServer(port)`
88
with canonical bucket boundaries pre-configured.
99

10+
The C# SDK implements the cross-SDK canonical metrics catalog directly. Because the C# metrics
11+
surface was not released before harmonization, there is no legacy mode and no
12+
`WORKER_CANONICAL_METRICS` environment variable. Other Conductor SDKs (Python, Go, Java,
13+
JavaScript, Ruby) that had previously released metrics offer a gated switchout between legacy
14+
and canonical implementations -- that distinction does not apply here.
15+
1016
## Table of Contents
1117

1218
- [Quick Reference](#quick-reference)
@@ -21,9 +27,11 @@ with canonical bucket boundaries pre-configured.
2127
- [Time Histograms](#time-histograms)
2228
- [Size Histograms](#size-histograms)
2329
- [Gauges](#gauges)
30+
- [Non-Applicable Metrics](#non-applicable-metrics)
2431
- [Labels](#labels)
2532
- [Bucket Boundaries](#bucket-boundaries)
2633
- [Best Practices](#best-practices)
34+
- [Troubleshooting](#troubleshooting)
2735

2836
## Quick Reference
2937

@@ -217,6 +225,23 @@ Point-in-time values sampled by the metrics listener.
217225
|---|---|---|
218226
| `active_workers` | `taskType` | Number of concurrent task executions in progress. Updated on every poll cycle. |
219227

228+
## Non-Applicable Metrics
229+
230+
The cross-SDK canonical catalog defines additional metrics that are registered in
231+
`MetricsCollector` as public API surface but are never incremented by the internal worker
232+
runner. They are available for user code that layers on its own semantics.
233+
234+
| Canonical metric | Why N/A for the internal runner |
235+
|---|---|
236+
| `task_ack_error_total` | The batch-poll response serves as the ack; there is no separate ack call. |
237+
| `task_ack_failed_total` | Same reason. |
238+
| `worker_restart_total` | Python-only. Its multi-process supervisor restarts child processes. The .NET SDK uses async tasks. |
239+
| `external_payload_used_total` | The C# client does not yet integrate with Conductor's external-payload-storage API. The counter is registered so user code can call `RecordExternalPayloadUsed()` if it implements its own integration. |
240+
241+
Users cross-referencing the harmonization spec or documentation from other Conductor SDKs may
242+
notice these metrics in other catalogs. Their absence from the C# worker runner's output is
243+
intentional.
244+
220245
## Labels
221246

222247
All labels use **camelCase** per the cross-SDK canonical specification.
@@ -225,14 +250,25 @@ All labels use **camelCase** per the cross-SDK canonical specification.
225250
|---|---|---|
226251
| `taskType` | Most metrics | Task definition name (e.g. `"my_worker"`) |
227252
| `exception` | Error counters, `thread_uncaught_exceptions_total` | Exception class name (e.g. `"HttpRequestException"`) |
228-
| `status` | Time histograms | `"SUCCESS"` or `"FAILURE"` |
253+
| `status` | Task time histograms | `"SUCCESS"` or `"FAILURE"`. For `http_api_client_request_seconds`, the HTTP status code as a string (or `"0"` on network failure). |
229254
| `workflowType` | `workflow_start_error_total`, `workflow_input_size_bytes` | Workflow definition name |
230-
| `version` | `workflow_input_size_bytes` | Workflow version string |
255+
| `version` | `workflow_input_size_bytes` | Workflow version as a string. Empty string when the version is absent. |
231256
| `entityName` | `external_payload_used_total` | Entity name |
232257
| `operation` | `external_payload_used_total` | `"READ"` or `"WRITE"` |
233258
| `payloadType` | `external_payload_used_total` | `"TASK_INPUT"`, `"TASK_OUTPUT"`, `"WORKFLOW_INPUT"`, `"WORKFLOW_OUTPUT"` |
234259
| `method` | `http_api_client_request_seconds` | HTTP verb (e.g. `"GET"`, `"POST"`) |
235-
| `uri` | `http_api_client_request_seconds` | Request path |
260+
| `uri` | `http_api_client_request_seconds` | Request path (interpolated, not templated -- see note below) |
261+
262+
The OpenTelemetry .NET SDK is the [recommended way](https://learn.microsoft.com/en-us/dotnet/core/diagnostics/metrics-collection) to export `System.Diagnostics.Metrics` to Prometheus (.NET has no built-in Prometheus exporter). As a result, the OTel exporter adds `otel_scope_name="Conductor.Client"` to every metric series
263+
to identify the originating `Meter`. This label does not appear in the output of other Conductor
264+
SDKs, which use native Prometheus client libraries that do not have this convention. There is
265+
currently no configuration option to suppress it
266+
([opentelemetry-dotnet#5725](https://github.com/open-telemetry/opentelemetry-dotnet/issues/5725)).
267+
268+
The `uri` label on `http_api_client_request_seconds` currently contains the interpolated request
269+
path (e.g. `/api/tasks/poll/batch/my_task_type`) rather than a templated form
270+
(`/api/tasks/poll/batch/{taskType}`). Operators who need bounded cardinality on this label can
271+
apply Prometheus `metric_relabel_configs` at scrape time.
236272

237273
## Bucket Boundaries
238274

@@ -270,3 +306,30 @@ MetricsCollector.CanonicalSizeBuckets
270306
6. **The `MetricsCollector` is available as a singleton via DI.** You can inject it into your
271307
own services to record `workflow_start_error_total`, `external_payload_used_total`, or any
272308
other metrics that occur outside the poll loop.
309+
310+
## Troubleshooting
311+
312+
### Metrics Are Empty
313+
314+
- Verify that `MetricsCollector` is registered. When using `AddConductorWorker()`, it is
315+
registered automatically as a singleton.
316+
- Verify workers have polled or executed tasks. Metrics are created lazily when the
317+
corresponding event occurs.
318+
- Confirm the scrape endpoint is reachable at the expected host and port.
319+
320+
### Missing HTTP or Workflow Metrics
321+
322+
- `http_api_client_request_seconds` is recorded inside `ApiClient.CallApi()` /
323+
`CallApiAsync()`. It requires a `MetricsCollector` to be injected via DI. If you are
324+
constructing `ApiClient` manually (outside `AddConductorWorker()`), ensure a
325+
`MetricsCollector` instance is available in the service provider.
326+
- `workflow_start_error_total` and `workflow_input_size_bytes` are recorded in
327+
`WorkflowExecutor.StartWorkflow()` and require the optional `MetricsCollector` parameter.
328+
When using DI, the executor resolves `MetricsCollector` from the container automatically.
329+
330+
### High Cardinality
331+
332+
- Watch the `uri` label on `http_api_client_request_seconds`. The SDK records the
333+
interpolated request path, which includes task type names and workflow IDs in the URL.
334+
- Avoid embedding user identifiers or unbounded values in task type, workflow type, or
335+
external payload labels.

docs/readme/workers.md

Lines changed: 3 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -50,19 +50,8 @@ Thread.Sleep(TimeSpan.FromSeconds(100));
5050

5151
Check out our [integration tests](https://github.com/conductor-oss/csharp-sdk/blob/main/Tests/Worker/WorkerTests.cs) for more examples
5252

53-
Worker SDK collects the following metrics:
54-
55-
56-
| Name | Purpose | Tags |
57-
| ------------------ | :------------------------------------------- | -------------------------------- |
58-
| task_poll_error | Client error when polling for a task queue | taskType, includeRetries, status |
59-
| task_execute_error | Execution error | taskType |
60-
| task_update_error | Task status cannot be updated back to server | taskType |
61-
| task_poll_counter | Incremented each time polling is done | taskType |
62-
| task_poll_time | Time to poll for a batch of tasks | taskType |
63-
| task_execute_time | Time to execute a task | taskType |
64-
| task_result_size | Records output payload size of a task | taskType |
65-
66-
Metrics on client side supplements the one collected from server in identifying the network as well as client side issues.
53+
The worker framework records polling, execution, update, and error metrics via
54+
`MetricsCollector`. See [Metrics Documentation](../metrics.md) for the complete metric
55+
catalog, configuration options, and best practices.
6756

6857
### Next: [Create and Execute Workflows](https://github.com/conductor-oss/csharp-sdk/blob/main/docs/readme/workflow.md)

0 commit comments

Comments
 (0)