-
Notifications
You must be signed in to change notification settings - Fork 494
Support metadata in Remote Write V2 #5045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
💻 Deploy preview available (Support metadata in Remote Write V2): |
kgeckhart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran out of time so didn't quite get to check this in depth and will do it tomorrow. Surface level it all makes sense.
| select { | ||
| case <-time.After(120 * time.Second): | ||
| require.FailNow(t, "timed out waiting for metrics") | ||
| case actual := <-writeResult: | ||
| require.JSONEq(t, expectedResponse, actual) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might end up with the same flakiness we fixed in, 58cce1d, since it depends on all the metrics to be appended before the default batch deadline. Might be fine but might be able to improve runtime/stability.
I think this is true for the other usages here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this now, I'm not really sure why it would fail. We call Commit() when we have finished forwarding all the samples we want to send. The tests write so few samples, that it'd be reasonable to expect the default config to send all of them at once rather than in tiny batches. I tried to reproduce the failure using the command you mentioned, and I even bumped up the counter to 200, but the tests still passed...
└─▪ go test ./internal/component/prometheus/remotewrite/... -run '^Test$' -count 200
ok github.com/grafana/alloy/internal/component/prometheus/remotewrite 0.512s [no tests to run]
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't test it with your changes and I did miss the adjustment to send all the metrics before commit.
It is still susceptible to the same flaw as the commit per metric version though. The watcher is going to send a batch of decoded samples which queue_manager is going to iterate over to enqueue in a shard, individually appending them to the current batch, at any point while iterating the shard timer can trigger causing the current batch to be sent.
Committing the whole batch will help but you're fighting a race condition with the shard timer that you could lose at any point in time. The original version didn't fail often, even with my multiple bad refactors I ran it 100 times and it didn't catch it.
kgeckhart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more minor comments, nice work!
757c028 to
46d0290
Compare
46d0290 to
0b23768
Compare
Adding metadata support to
prometheus.remote_writecomponent, but only if Remote Write v2 has been configured.In order for
prometheus.remote_writeto receive metadata,prometheus.scrapemust be configured withhonor_metadata = true.