xdsclient: Fix flakyness in TestResourceUpdateMetrics
in the case of repeated NACKs
#8363
+2
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes: #8344
Successful forge test
In a rare case scenario of NACK, the xDS client can emit metrics in quick succession due to the interaction sequence between the xDS client (in grpc-go) and the test management server (using go-control-plane) after the initial invalid update. The test uses
testMetricsReporter
for xdsclient to report metrics which callsmetricsReporter.ReportMetric()
to report invalid and valid xds resource updates. The flakiness is due to timeout inr.metricsCh.Send(m)
in the case burst of redundant updates because themetricsCh
intestMetricsReporter
is atestutils.Channel
initialized with a size of 1. A Send operation on this channel will block if the channel is full. See #8344 (comment)We can mitigate the race by making the metric sends non-block by dropping redundant update and replacing with the new one. Similar to what is being done in internal grpc metrics recorder https://github.com/grpc/grpc-go/blob/master/internal/testutils/stats/test_metrics_recorder.go#L125-L126
RELEASE NOTES: None