[+] implement parallel source discovery by pashagolub · Pull Request #1378 · cybertec-postgresql/pgwatch

pashagolub · 2026-04-24T12:26:51Z

Improves dead-source handling with parallel resolution and instance_up=0 on discovery failure.

Sources.ResolveDatabases() previously resolved each source sequentially. A single slow or unresponsive source (e.g. a continuous-discovery endpoint behind a firewall) would block discovery of all subsequent sources for the full connection timeout duration.

Sources are now resolved concurrently using sync.WaitGroup.Go(). Results are collected into a pre-allocated indexed slice to preserve deterministic ordering. Per-source error logging with source name is included in the resolver itself.

When a SourcePostgresContinuous or SourcePatroni source fails to resolve any databases, LoadSources() now emits instance_up=0 to the configured sinks. This makes the failure visible in dashboards and alerting, consistent with how unreachable directly-monitored sources are handled.

coveralls · 2026-04-24T12:31:10Z

Coverage Report for CI Build 25027401738

Coverage decreased (-0.2%) to 83.177%

Details

Coverage decreased (-0.2%) from the base build.
Patch coverage: 23 uncovered changes across 2 files (5 of 28 lines covered, 17.86%).
No coverage regressions found.

Uncovered Changes

File	Changed	Covered	%
internal/sources/resolver.go	21	0	0.0%
internal/reaper/reaper.go	6	4	66.67%

Coverage Regressions

No coverage regressions found.

Coverage Stats


Relevant Lines:	5338
Covered Lines:	4440
Line Coverage:	83.18%
Coverage Strength:	0.95 hits per line

💛 - Coveralls

0xgouda · 2026-04-27T02:32:41Z

Is this related to #1377?

pashagolub · 2026-04-27T08:48:32Z

Is this related to #1377?

In some way. I couldn't reproduce that issue but I found that misconfigured discovery sources could cause a huge delays.

Improves dead-source handling with parallel resolution and instance_up=0 on discovery failure. `Sources.ResolveDatabases()` previously resolved each source sequentially. A single slow or unresponsive source (e.g. a continuous-discovery endpoint behind a firewall) would block discovery of all subsequent sources for the full connection timeout duration. Sources are now resolved concurrently using `sync.WaitGroup.Go()`. Results are collected into a pre-allocated indexed slice to preserve deterministic ordering. Per-source error logging with source name is included in the resolver itself. When a `SourcePostgresContinuous` or `SourcePatroni` source fails to resolve any databases, `LoadSources()` now emits `instance_up=0` to the configured sinks. This makes the failure visible in dashboards and alerting, consistent with how unreachable directly-monitored sources are handled.

0xgouda · 2026-04-28T02:34:34Z

+			if onError != nil {
+				onError(srcs[i].Name)
+			}


I have a concern here that can be reproduced with the following steps:

Define a target that happens to be unreachable and is of the kind postgres-continuous-discovery

pgwatch writes instance_up=0 for the target for a while with dbname = sourceName

The target becomes alive

pgwatch runs for a while and now writes the updated instance_up = 1, but with a new dbname = sourceName + _ + realDbname

The target is down again and its instance_up = 0 is written with dbname = sourceName + _ + realDbname

So the full instance uptime history becomes a bit disconnected, with different dbname[s].

But generally, I think that's the best we can do, just wanted to note this behaviour.

Good catch! We could use source instead of dbname. This way we will know for sure at which point that happened

Can you explain more?

// WriteInstanceDown writes instance_up = 0 metric to sinks for the given source func (r *Reaper) WriteInstanceDown(md *sources.SourceConn) { r.measurementCh <- metrics.MeasurementEnvelope{ DBName: md.Name, MetricName: specialMetricInstanceUp, Data: metrics.Measurements{metrics.Measurement{ metrics.EpochColumnName: time.Now().UnixNano(), "kind": string(md.Kind), // ^^^^^^^^^^^^^^^^^^^^^^^ specialMetricInstanceUp: 0}, }, } }

this way grafana could distinguish regular databases vs all other

pashagolub self-assigned this Apr 24, 2026

pashagolub requested a review from 0xgouda April 24, 2026 12:27

pashagolub added the sources What sources and in what way to monitor label Apr 24, 2026

pashagolub force-pushed the parallel-source-discovery branch from a3814f5 to a715672 Compare April 24, 2026 18:41

pashagolub added 3 commits April 28, 2026 03:40

reaper.WriteInstanceDown() accepts name as an argument

693d75e

use on_error callback in Sources.ResolveDatabases()

75ea9bd

0xgouda force-pushed the parallel-source-discovery branch from a715672 to 75ea9bd Compare April 28, 2026 00:40

0xgouda reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[+] implement parallel source discovery#1378

[+] implement parallel source discovery#1378
pashagolub wants to merge 3 commits intomasterfrom
parallel-source-discovery

pashagolub commented Apr 24, 2026 •

edited

Loading

Uh oh!

coveralls commented Apr 24, 2026 •

edited

Loading

Uh oh!

0xgouda commented Apr 27, 2026

Uh oh!

pashagolub commented Apr 27, 2026

Uh oh!

0xgouda Apr 28, 2026 •

edited

Loading

Uh oh!

pashagolub Apr 28, 2026

Uh oh!

0xgouda Apr 28, 2026

Uh oh!

pashagolub Apr 28, 2026 •

edited

Loading

Uh oh!

pashagolub Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pashagolub commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report for CI Build 25027401738

Coverage decreased (-0.2%) to 83.177%

Details

Uncovered Changes

Coverage Regressions

Coverage Stats

💛 - Coveralls

Uh oh!

0xgouda commented Apr 27, 2026

Uh oh!

pashagolub commented Apr 27, 2026

Uh oh!

0xgouda Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pashagolub Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

0xgouda Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

pashagolub Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pashagolub Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pashagolub commented Apr 24, 2026 •

edited

Loading

coveralls commented Apr 24, 2026 •

edited

Loading

0xgouda Apr 28, 2026 •

edited

Loading

pashagolub Apr 28, 2026 •

edited

Loading