Skip to content

Conversation

@fenos
Copy link
Contributor

@fenos fenos commented Dec 31, 2025

What kind of change does this PR introduce?

Feature

What is the current behavior?

Currently metrics are gathered using the prom-client and exposed to a metric endpoint

What is the new behavior?

  • All metrics are now otel metrics and can be pushed to a 3rd party otel backend
  • We still can optionally expose prometheus metrics in the /metrics endpoint using the otel prom exporter (ENABLE_PROMETHEUS_METRICS)
  • All children spans will now include the tenant.ref attribute from a top level span
  • Revamped grafana dashboard using otel metrics

Grafana dashboard

Running completely locally

png (2)

@coveralls
Copy link

coveralls commented Dec 31, 2025

Pull Request Test Coverage Report for Build 20783124376

Details

  • 428 of 943 (45.39%) changed or added relevant lines in 23 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.4%) to 75.635%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/admin-app.ts 5 6 83.33%
src/internal/database/pool.ts 1 2 50.0%
src/internal/queue/event.ts 1 7 14.29%
src/internal/queue/queue.ts 3 9 33.33%
src/internal/monitoring/system/base-collector.ts 33 45 73.33%
src/internal/monitoring/system/process-start-collector.ts 12 26 46.15%
src/http/plugins/metrics.ts 105 124 84.68%
src/internal/monitoring/system/handles-collector.ts 10 36 27.78%
src/internal/monitoring/system/instrumentation.ts 41 82 50.0%
src/internal/monitoring/system/cpu-collector.ts 3 49 6.12%
Files with Coverage Reduction New Missed Lines %
src/admin-app.ts 1 89.19%
Totals Coverage Status
Change from base Build 20603297633: -0.4%
Covered Lines: 25678
Relevant Lines: 33679

💛 - Coveralls

@fenos fenos force-pushed the monitoring/otel-metrics branch from ddcc147 to c9d5fd0 Compare January 7, 2026 13:29
Comment on lines +605 to +617
{
"datasource": {
"type": "prometheus",
"uid": "local_prometheus"
},
"editorMode": "code",
"expr": "sum(increase(storage_api_otel_upload_started_total{region=~\"$region\", instance=~\"$instance\"}[$__range]))",
"hide": false,
"instant": false,
"legendFormat": "__auto",
"range": true,
"refId": "C"
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was supposed to be multipart I guess according to description but instead duplicates query A

Context,
} from '@opentelemetry/api'
import { ReadableSpan, SpanProcessor } from '@opentelemetry/sdk-trace-base'
import { Span as SdkSpan } from '@opentelemetry/sdk-trace-base/build/src/Span'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible not to import from build?

hostMetrics.start()

// Register Node.js runtime instrumentations
registerInstrumentations({

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to capture returned function unregister here to pass shutdown below. Otherwise, it might prevent process from exiting

this.updateMetrics().catch(() => {
// Ignore errors
})
this._updateInterval = setInterval(() => this.updateMetrics().catch(() => {}), 5000)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re:https://github.com/supabase/storage/pull/819/changes#r2686374454

If we address above comment, this would be unnecessary but I would add unref call here

is_standard: uploadType === 'standard' ? 1 : 0,
is_s3: uploadType === 's3' ? 1 : 0,
fileUploadedSuccess.add(1, {
is_multipart: String(uploadType === 'resumable' ? 1 : 0),
Copy link

@ferhatelmas ferhatelmas Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an inconsistency in the label values of fileUploadStarted (https://github.com/supabase/storage/pull/819/changes#diff-27bcc650282e1d4d6ab25be5c1bf343e75aa41e76844d93be70a146b4391a82bR83). Its is_multipart can be true or false but here 1 or 0. Is this intentional?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants