Skip to content

Conversation

@ptodev
Copy link
Contributor

@ptodev ptodev commented Dec 9, 2025

Adding metadata support to prometheus.remote_write component, but only if Remote Write v2 has been configured.

In order for prometheus.remote_write to receive metadata, prometheus.scrape must be configured with honor_metadata = true.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

💻 Deploy preview available (Support metadata in Remote Write V2):

Copy link
Contributor

@kgeckhart kgeckhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran out of time so didn't quite get to check this in depth and will do it tomorrow. Surface level it all makes sense.

Comment on lines +206 to +213
select {
case <-time.After(120 * time.Second):
require.FailNow(t, "timed out waiting for metrics")
case actual := <-writeResult:
require.JSONEq(t, expectedResponse, actual)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might end up with the same flakiness we fixed in, 58cce1d, since it depends on all the metrics to be appended before the default batch deadline. Might be fine but might be able to improve runtime/stability.

I think this is true for the other usages here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this now, I'm not really sure why it would fail. We call Commit() when we have finished forwarding all the samples we want to send. The tests write so few samples, that it'd be reasonable to expect the default config to send all of them at once rather than in tiny batches. I tried to reproduce the failure using the command you mentioned, and I even bumped up the counter to 200, but the tests still passed...

└─▪ go test ./internal/component/prometheus/remotewrite/... -run '^Test$' -count 200
ok      github.com/grafana/alloy/internal/component/prometheus/remotewrite      0.512s [no tests to run]

Am I missing something?

Copy link
Contributor

@kgeckhart kgeckhart Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test it with your changes and I did miss the adjustment to send all the metrics before commit.

It is still susceptible to the same flaw as the commit per metric version though. The watcher is going to send a batch of decoded samples which queue_manager is going to iterate over to enqueue in a shard, individually appending them to the current batch, at any point while iterating the shard timer can trigger causing the current batch to be sent.

Committing the whole batch will help but you're fighting a race condition with the shard timer that you could lose at any point in time. The original version didn't fail often, even with my multiple bad refactors I ran it 100 times and it didn't catch it.

Copy link
Contributor

@kgeckhart kgeckhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more minor comments, nice work!

@ptodev ptodev force-pushed the kgeckhart/metadata-integration-tests-remote_write branch from 757c028 to 46d0290 Compare December 10, 2025 22:44
@ptodev ptodev force-pushed the kgeckhart/metadata-integration-tests-remote_write branch from 46d0290 to 0b23768 Compare December 11, 2025 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants