Description
We never had these tests properly setup. Now that Jaeger SDKs are deprecated we don't even know if there is a working example of adaptive sampling using OTEL SDKs. We need to set up integration tests to validate that the functionality is still working.
There are two possible ways we could validate adaptive sampling:
- Start generating a load with a certain known rate and wait until adaptive sampling probability is adjusted from the default to a certain expected level. This is a bit tricky since the performance of the trace generator is unpredictable in GH Action runners, so we don't quite know how much traffic it would generate.
- Instead of waiting to achieve an expected level of sampling, we could instead run two different workloads, e.g. 1x volume and then 2x volume. Even if we don't know the exact expected level of sampling probability, we can expect that it should become about 2x smaller once the load increases.
And even simpler alternative might be to just detect that the pre-configured default probability (which would be returned by the backend in the absence of any traffic for a given service) would change to a different calculated probability. For example, we could configure default probability as 100%, configure adaptive sampling target to be 1 trace per second, and run load generator with 10 traces per second. We would expect the calculated rate to go down to about 10% (that's the strategy 1 above), but even a change to 50% would already be an indication that adaptive sampling is working end to end. We don't need the e2e test to validate the exact calculation, since we do that in the unit tests.
Activity