feat: add Prometheus Pushgateway support for CLI apps by coolwednesday · Pull Request #3176 · gofr-dev/gofr

coolwednesday · 2026-03-17T08:15:58Z

Summary

CLI applications are short-lived — they exit before Prometheus can scrape /metrics. This PR adds push-based metrics export via Prometheus Pushgateway for GoFr CLI apps with cumulative counters across runs.

Closes #2232

Problem

Each CLI run starts counters at 0. A plain Pushgateway overwrites on each push, so counters never accumulate (run1=1, run2=1, run3=1 instead of 1, 2, 3). Gauges like last_success_timestamp must not be summed.

Solution: Read-Modify-Write

Instead of using an aggregation gateway, we implement read-modify-write on the standard Pushgateway:

CLI Run N:
  1. GET /metrics from Pushgateway (fetch existing values)
  2. Gather local metrics from this run
  3. Merge:
     - Counters/Histograms → sum existing + local
     - Gauges → use local value (latest wins)
  4. PUT merged result back to Pushgateway

Why not an aggregation gateway?

Zapier prom-aggregation-gateway: Sums ALL metric types including gauges — last_success_timestamp would produce nonsensical values (timestamp1 + timestamp2)
Prometheus Gravel Gateway: Supports per-type aggregation via clearmode label, but dormant since Nov 2023, single maintainer, no prebuilt Docker images, only ~117 stars

Read-modify-write gives correct semantics with zero additional infrastructure. The trade-off is a small race window when concurrent CLI runs overlap, but this is unlikely for CLI workloads (worst case: one lost increment).

What's included

Read-modify-write Pushgateway client (pkg/gofr/metrics/exporters/pushgateway.go):
- Custom HTTP client using expfmt for Prometheus text format encoding/decoding
- mergeMetrics() — sums counters/histograms, replaces gauges
- labelKey() — matches metrics by app-defined labels only, filtering out Pushgateway-injected (job, instance) and OTel scope labels
Auto CLI metrics in cmd.go:
- app_cmd_duration_seconds (histogram with CLI-appropriate buckets)
- app_cmd_success (counter, cumulative across runs)
- app_cmd_failures (counter, cumulative across runs)
- app_cmd_last_success_timestamp (gauge, latest value)
CLI shutdown path in run.go: Calls Shutdown() after cmd.Run() to flush metrics
Config-driven: Set METRICS_PUSH_GATEWAY_URL env var to enable (CLI only)
Enriched CMD Metrics dashboard merged into the main GoFr Application Services Monitoring dashboard:
- Health Overview: jobs tracked, last push age, total successes/failures, success rate
- Per-Job Breakdown: table with merge transform (successes, failures, p95 duration, last success per job×command)
- Duration Analysis: bar chart with p50/p90/p95/p99 percentiles, gauge panel with thresholds
- $job and $command template variables for CLI filtering
Pushgateway added to http-server docker setup: docker-compose service + Prometheus scrape config with honor_labels: true
sample-cmd README expanded with setup instructions and GitHub links to shared docker/dashboard setup

Design decisions

Pushgateway is wired in NewCMD() only — HTTP apps continue using pull-based scraping
Container owns the pushgateway and flushes on Close()
Dropped prometheus/push dependency — raw HTTP with expfmt gives full control over the read-modify-write cycle
Uses dedicated AppRegistry (not DefaultGatherer) to avoid pushing Go runtime metrics
Dashboard uses ${DataSource} variable and collapsed row — non-intrusive for HTTP-only users

Test plan

go build ./... compiles
go test ./pkg/gofr/metrics/exporters/ — 18 tests covering all merge logic, label filtering, error paths
go test ./pkg/gofr/container/ ./pkg/gofr/ — existing tests pass
golangci-lint run clean
go vet -race clean on our packages
Docker smoke test: run hello×6, fail×4, batch×1, progress×1 → counters accumulated correctly, gauge shows latest timestamp, histogram buckets merged
Grafana dashboard verified: all panels populated, merge transform working, otel labels hidden

CLI apps are short-lived and exit before Prometheus can scrape /metrics. This adds push-based metrics export via Pushgateway, configured through METRICS_PUSH_GATEWAY_URL env var, along with auto CLI metrics tracking (duration, success/error counters) and observability infrastructure. Closes gofr-dev#2232

Umang01-hash

Issue #2232 explicitly listed "Support cleanup (optional) so old metrics don't pile up" as a requirement. Every CronJob run permanently adds a job group to the Pushgateway. Please add A Delete(ctx context.Context) error method on PushGateway using pusher.DeleteContext(ctx) and METRICS_PUSH_GATEWAY_DELETE_ON_FINISH=true env var to opt in .
All apps without APP_NAME set push under the same job group and silently overwrite each other. Change the fallback to filepath.Base(os.Args[0]) or add a dedicated METRICS_PUSH_GATEWAY_JOB env var override.
Current max bucket is 60s. Cron buckets extend to 3600s. A 5-minute batch job falls into +Inf only. Align upper boundary with app_cron_duration_seconds.
Metric naming inconsistency with cron :
app_cmd_errors_total → app_cmd_failures (match cron's _failures)
app_cmd_success_total → app_cmd_success (match cron's no-_total)
Add app_cmd_total (match cron's app_cron_job_total)
Move metricServer.Shutdown(ctx) before container.Close() in Shutdown() so the Prometheus scrape endpoint stops accepting requests before the OTel meter provider is shut down.

pkg/gofr/container/container.go

pkg/gofr/factory.go

pkg/gofr/run.go

pkg/gofr/metrics/exporters/pushgateway.go

coolwednesday · 2026-03-20T06:58:49Z

Regarding Comment 1 (Delete support / METRICS_PUSH_GATEWAY_DELETE_ON_FINISH):

The Pushgateway documentation explicitly states that the Pushgateway is designed as a metric cache — the standard recommendation is to not delete pushed metrics, and instead use job and instance labels to distinguish runs.

If you push and immediately delete, Prometheus may not have scraped yet (typical scrape interval is 15–30s), and the metrics are lost forever. There's no reliable way for the CLI to know whether Prometheus has completed its scrape before issuing a delete.

For users who need cleanup of stale metrics, this is best handled at the Pushgateway operational level (e.g., Pushgateway's own --push.disable-consistency-check flag, TTL configurations, or external cron jobs that prune old job groups) — not from the framework level. Baking delete into the framework adds a footgun that's hard to use safely by default.

This can always be revisited in a follow-up if users explicitly request it, but for v1 the "push and leave" approach is the correct and safe default.

coolwednesday · 2026-03-20T07:18:49Z

Regarding Comment 5 (Shutdown order — move metricServer.Shutdown before container.Close):

The current shutdown order is actually correct:

httpServer.Shutdown → grpcServer.Shutdown → container.Close() → metricServer.Shutdown

The /metrics HTTP endpoint should stay alive as long as possible so Prometheus can scrape final metrics. Shutting it down earlier would mean Prometheus misses the last scrape window.

For the Pushgateway path specifically, the push happens inside container.Close() before the meter provider shuts down — which is the right sequence (push metrics first, then tear down the provider).

coolwednesday · 2026-03-20T07:18:51Z

Regarding Comment 8 (Factory.go test coverage):

The new pushgateway wiring in factory.go is 4 lines of config-read + constructor call. The core logic (NewPushGateway, Push) is already covered in pushgateway_test.go. Writing a proper test for the factory wiring requires heavy config mocking for minimal additional coverage. Deferring this to a follow-up PR.

- Replace basic CMD Metrics panels with enriched CLI dashboard (health overview, job status table, duration bar chart with p50-p99) - Add pushgateway service to http-server docker-compose - Add pushgateway scrape config with honor_labels - Add $job and $command template variables for CLI filtering - Expand sample-cmd README with setup instructions

coolwednesday · 2026-03-24T06:43:48Z

Here is the screenshot of the CLI Dashboard :

Umang01-hash reviewed Mar 18, 2026

View reviewed changes

merge development to resolve go.work.sum conflicts

7bf1ee5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Prometheus Pushgateway support for CLI apps#3176

feat: add Prometheus Pushgateway support for CLI apps#3176
coolwednesday wants to merge 3 commits intogofr-dev:developmentfrom
coolwednesday:feature/metrics-pushgateway-cli

coolwednesday commented Mar 17, 2026 •

edited

Loading

Uh oh!

Umang01-hash left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coolwednesday commented Mar 20, 2026

Uh oh!

coolwednesday commented Mar 20, 2026

Uh oh!

coolwednesday commented Mar 20, 2026

Uh oh!

coolwednesday commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

coolwednesday commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution: Read-Modify-Write

What's included

Design decisions

Test plan

Uh oh!

Umang01-hash left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coolwednesday commented Mar 20, 2026

Uh oh!

coolwednesday commented Mar 20, 2026

Uh oh!

coolwednesday commented Mar 20, 2026

Uh oh!

coolwednesday commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coolwednesday commented Mar 17, 2026 •

edited

Loading