Skip to content

feat: Add per-instance OTEL resource attributes for metrics and logs#77

Merged
rpurdel merged 3 commits intomainfrom
feat/otel-instance-id
Mar 23, 2026
Merged

feat: Add per-instance OTEL resource attributes for metrics and logs#77
rpurdel merged 3 commits intomainfrom
feat/otel-instance-id

Conversation

@rpurdel
Copy link
Copy Markdown

@rpurdel rpurdel commented Mar 23, 2026

Summary

  • Split createResource() into separate metrics and logs resources to handle Mimir vs Loki cardinality differently
  • Metrics use service.instance.id = container instance name (meeting ID) for unique series in Mimir
  • Logs use static service.instance.id to prevent Loki stream explosion, with runId carrying the instance identity as structured metadata
  • Worker passes CONTAINER_INSTANCE_NAME (the human-readable session/meeting ID) to container via startOptions.envVars
  • Added city and country resource attributes from CF runtime env vars

Key decisions

  • runId field name chosen to avoid Loki auto-indexing (unlike session_id or container_id)
  • startOptions.envVars fully replaces class-level envVars (CF containers ?? semantics), so buildContainerEnvVars() is shared between class default and start override
  • CONTAINER_INSTANCE_NAME falls back to CLOUDFLARE_DURABLE_OBJECT_ID then crypto.randomUUID() for non-CF environments

Test plan

  • Typecheck passes
  • All 438 unit tests pass
  • Deployed to staging — verified runId shows meeting UUID in Loki
  • Verified container starts with all env vars intact (no regression from envVars override)
  • Verified city/country appear as non-indexed fields in Loki

@claude
Copy link
Copy Markdown

claude Bot commented Mar 23, 2026

Code Review

Good PR overall — the Mimir/Loki cardinality split is the right approach and the implementation is clean. A few things worth addressing:

Bug: Stale JSDoc comment left in place

In src/telemetry.ts, the original createResource() JSDoc comment is not removed. Because the diff adds the new baseResourceAttributes() JSDoc immediately below the unchanged context line, the resulting file ends up with two back-to-back JSDoc blocks:

/**
 * Create shared resource with common attributes for both metrics and logs.
 */
/**
 * Build the base resource attributes shared by metrics and logs.
 */
function baseResourceAttributes(): Record<string, string> {

The first comment is now stale and should be deleted.

Minor: buildContainerEnvVars is called twice but only one call matters

The class-level envVars is initialised via buildContainerEnvVars(this.env), but per CF containers semantics documented in the PR, startOptions.envVars fully replaces it. Since the fetch handler always passes startOptions.envVars, the class-level call is never used in production. Fine as a safety net, but a comment would help future readers understand why the function appears in both places.

Minor: instanceId side-effect at module load time

The module-level instanceId constant is evaluated at import time. Tests that set process.env.CONTAINER_INSTANCE_NAME after importing telemetry.ts will not see the updated value. This is the correct pattern for a singleton server process — worth confirming test setup order does not hit this edge case.

Nit: Non-standard OTEL attribute names

city and country are non-standard OTEL semantic convention names (the spec uses cloud.region, host.name, etc.). The PR description explains this is intentional to avoid Loki auto-indexing. A short inline comment at the usage site would make the reasoning self-documenting for future readers.

Test coverage

No new unit tests for createMetricsResource / createLogsResource / buildContainerEnvVars. Given staging verification and all 438 existing tests passing this is not a blocker, but the two resource functions are pure and the service.instance.id difference between metrics and logs is the core invariant of this change — a quick unit test would lock it in.


Overall solid work. The stale JSDoc is the only thing I would fix before merge; the rest are suggestions.

@claude
Copy link
Copy Markdown

claude Bot commented Mar 23, 2026

Additional Code Review Notes

(Adding to the existing review — the stale JSDoc, double buildContainerEnvVars call, and instanceId module-load timing points already raised are all valid.)

Minor: service.instance.id static value duplicates the service name string

In createLogsResource():

attrs['service.instance.id'] = 'opus-transcriber-proxy';

This hardcodes the same string already assigned via ATTR_SERVICE_NAME a few lines above. If the service name ever changes, this static fallback needs a separate update. Consider extracting a SERVICE_NAME constant:

const SERVICE_NAME = 'opus-transcriber-proxy';
// ...
attrs[ATTR_SERVICE_NAME] = SERVICE_NAME;
// ...
attrs['service.instance.id'] = SERVICE_NAME; // static — keeps Loki stream count bounded

Pre-existing (not introduced here, but worth noting): OPENAI_API_KEY has no || '' fallback

In buildContainerEnvVars:

OPENAI_API_KEY: env.OPENAI_API_KEY,    // could be undefined → becomes the string "undefined"
GEMINI_API_KEY: env.GEMINI_API_KEY || '',  // correct

All other keys guard against undefined with || ''. This was in the original envVars block too, but the refactor is a good opportunity to fix it consistently.

Overall: Clean, well-motivated change. The Mimir/Loki split is the right architecture. The stale JSDoc is the only real fix needed before merge.

@rpurdel rpurdel merged commit 403bf51 into main Mar 23, 2026
2 checks passed
@rpurdel rpurdel deleted the feat/otel-instance-id branch March 23, 2026 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants