feat: add Prometheus Pushgateway support for CLI apps#3176
feat: add Prometheus Pushgateway support for CLI apps#3176coolwednesday wants to merge 3 commits intogofr-dev:developmentfrom
Conversation
CLI apps are short-lived and exit before Prometheus can scrape /metrics. This adds push-based metrics export via Pushgateway, configured through METRICS_PUSH_GATEWAY_URL env var, along with auto CLI metrics tracking (duration, success/error counters) and observability infrastructure. Closes gofr-dev#2232
Umang01-hash
left a comment
There was a problem hiding this comment.
-
Issue #2232 explicitly listed "Support cleanup (optional) so old metrics don't pile up" as a requirement. Every CronJob run permanently adds a job group to the Pushgateway. Please add A
Delete(ctx context.Context)error method on PushGateway using pusher.DeleteContext(ctx) andMETRICS_PUSH_GATEWAY_DELETE_ON_FINISH=trueenv var to opt in . -
All apps without
APP_NAMEset push under the same job group and silently overwrite each other. Change the fallback to filepath.Base(os.Args[0]) or add a dedicatedMETRICS_PUSH_GATEWAY_JOBenv var override. -
Current max bucket is 60s. Cron buckets extend to 3600s. A 5-minute batch job falls into +Inf only. Align upper boundary with app_cron_duration_seconds.
-
Metric naming inconsistency with cron :
app_cmd_errors_total → app_cmd_failures (match cron's _failures)
app_cmd_success_total → app_cmd_success (match cron's no-_total)
Add app_cmd_total (match cron's app_cron_job_total) -
Move
metricServer.Shutdown(ctx)beforecontainer.Close()in Shutdown() so the Prometheus scrape endpoint stops accepting requests before the OTel meter provider is shut down.
|
Regarding Comment 1 (Delete support / The Pushgateway documentation explicitly states that the Pushgateway is designed as a metric cache — the standard recommendation is to not delete pushed metrics, and instead use If you push and immediately delete, Prometheus may not have scraped yet (typical scrape interval is 15–30s), and the metrics are lost forever. There's no reliable way for the CLI to know whether Prometheus has completed its scrape before issuing a delete. For users who need cleanup of stale metrics, this is best handled at the Pushgateway operational level (e.g., Pushgateway's own This can always be revisited in a follow-up if users explicitly request it, but for v1 the "push and leave" approach is the correct and safe default. |
|
Regarding Comment 5 (Shutdown order — move metricServer.Shutdown before container.Close): The current shutdown order is actually correct: The For the Pushgateway path specifically, the push happens inside |
|
Regarding Comment 8 (Factory.go test coverage): The new pushgateway wiring in |
- Replace basic CMD Metrics panels with enriched CLI dashboard (health overview, job status table, duration bar chart with p50-p99) - Add pushgateway service to http-server docker-compose - Add pushgateway scrape config with honor_labels - Add $job and $command template variables for CLI filtering - Expand sample-cmd README with setup instructions

Summary
CLI applications are short-lived — they exit before Prometheus can scrape
/metrics. This PR adds push-based metrics export via Prometheus Pushgateway for GoFr CLI apps with cumulative counters across runs.Closes #2232
Problem
Each CLI run starts counters at 0. A plain Pushgateway overwrites on each push, so counters never accumulate (run1=1, run2=1, run3=1 instead of 1, 2, 3). Gauges like
last_success_timestampmust not be summed.Solution: Read-Modify-Write
Instead of using an aggregation gateway, we implement read-modify-write on the standard Pushgateway:
Why not an aggregation gateway?
last_success_timestampwould produce nonsensical values (timestamp1 + timestamp2)clearmodelabel, but dormant since Nov 2023, single maintainer, no prebuilt Docker images, only ~117 starsRead-modify-write gives correct semantics with zero additional infrastructure. The trade-off is a small race window when concurrent CLI runs overlap, but this is unlikely for CLI workloads (worst case: one lost increment).
What's included
pkg/gofr/metrics/exporters/pushgateway.go):expfmtfor Prometheus text format encoding/decodingmergeMetrics()— sums counters/histograms, replaces gaugeslabelKey()— matches metrics by app-defined labels only, filtering out Pushgateway-injected (job,instance) and OTel scope labelscmd.go:app_cmd_duration_seconds(histogram with CLI-appropriate buckets)app_cmd_success(counter, cumulative across runs)app_cmd_failures(counter, cumulative across runs)app_cmd_last_success_timestamp(gauge, latest value)run.go: CallsShutdown()aftercmd.Run()to flush metricsMETRICS_PUSH_GATEWAY_URLenv var to enable (CLI only)$joband$commandtemplate variables for CLI filteringhonor_labels: trueDesign decisions
NewCMD()only — HTTP apps continue using pull-based scrapingContainerowns the pushgateway and flushes onClose()prometheus/pushdependency — raw HTTP withexpfmtgives full control over the read-modify-write cycleAppRegistry(notDefaultGatherer) to avoid pushing Go runtime metrics${DataSource}variable and collapsed row — non-intrusive for HTTP-only usersTest plan
go build ./...compilesgo test ./pkg/gofr/metrics/exporters/— 18 tests covering all merge logic, label filtering, error pathsgo test ./pkg/gofr/container/ ./pkg/gofr/— existing tests passgolangci-lint runcleango vet -raceclean on our packages