Add batchprocessor to perf tests#2246
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2246 +/- ##
==========================================
- Coverage 87.55% 87.54% -0.01%
==========================================
Files 570 570
Lines 193480 193480
==========================================
- Hits 169398 169385 -13
- Misses 23556 23569 +13
Partials 526 526
🚀 New features to boost your workflow:
|
|
Are you hitting the issue described in #2194 when running these? I've actually been working on adding batch processor benchmarks too and whenever I've run them, they don't hit the limits with the current load generation setup |
|
This seems like as good a place as any to discuss benchmarks so I'm also curious about the scenarios that we want to test and whether to do continuous vs nightly. Just like with the current continuous benchmarks with the attribute processor, batching is sensitive to permutations across:
And across all three signal types as we take different code paths. My proposals would be:
We may want to add the ability to filter by dimensions in the aggregated charts to keep them sane - Something like checkboxes for signal (logs/metrics/traces) and scenario (batch/baseline/otap). |
|
Broadly agree with all that Jake. I think generally it would be a good idea to re-visit the line between continuous and nightly in light of the many new engine capabilities and see how we can pack more tests in while still aiming to not monopolize the shared runner instance (e.g. we can probably avoid blowing away the backend/loadgenerator after each test, parallelizing some of them, running some with loadgen in-process, etc, etc). I also agree that the chart layouts aren't really scaling well or providing an intuitive view of the world. Joe from F5 is looking at an alternative interface for comparing across different agents, which might be useful here when done. The orchestration framework has the ability to output all of the internal test data as parquet files (high level summary as well as second by second metrics)... I have a vision where we replace the whole 'store a big blob of json and use some static html charts on top of it' with 'store a few parquet files and use the duckdb wasm plugin on top of those to provide a highly interactive view of test results over much longer time periods'. One of these days... Obviously this is all just brain dump, no need to address as part of this PR =P |
Totally makes sense! I think it probably wouldn't be too hard to get a couple of checkboxes for signal/scenario dimensions into the continuous benchmark html charts (famous last words) in the meantime. I'm motivated to get some good batch processor benches up and running in the near term so I'm happy to employ some robot helpers and see if I can hack that together if that's the direction we want to go. If there's concerns about monopolizing the runner because the matrix is too big though also totally understand that. The proposal would be (batch(4) + baseline(2) + attr(4)) * signals (3) = 30 runs vs the 4 on that continuous chart today. We can alternatively start smaller and just add the batch + baseline scenarios into the chart (6 more scenarios) and not multiply across signals until we make other changes to use the shared runner more efficiently. |
|
I could also experiment with the larger matrix as a nightly and we can see how we like it and then add a smaller additional set to the continuous |
|
Glancing at it now, the current setup is taking anywhere from 30min to 1hr+ (!), and there's like 5 runs queued up... Maybe it's mostly time spent building, but I'd say start with nightly and we need to get those numbers way down before we start multiplying anything =P |
ah nvm that 1hr+ must include queue time... looks like they mostly finish in ~30 min (breakdown: 4m build, 16+3+4m for logs, idle, passthrough - this seems high). |
|
That's a decent amount of time with the rate of PRs we have! How do we feel about adding just otap-batch-otap and otlp-batch-otlp to the continuous to give us something and have everything else be nightly? Maybe it's still too much given the queues we have... |
Majority of the tests can be pushed to Nightly. I started with continuous to have it running frequently and adjust settings based on the runs. The plan was to always move them to nightly. I am also working on #1528 to make sure all the things we test/publish about perf is captured. |
|
I filed this, looks like something is broken with shutdown calls: #2257 That should drop the 16 minute one down to like 4 when fixed if someone has time to investigate. I definitely support having some batch testing in continuous (seems more useful than ATTR in any case), starting with the basic 2 and the rest in nightly sounds like a solid plan for now. |
|
|
||
| const fn default_otap_sizer_items() -> Sizer { | ||
| Sizer::Bytes | ||
| Sizer::Items |
Blocked on open-telemetry#2194 Trying to introduce batch processor to Perf tests, so as to catch ^ issues earlier. And also to actually measure the perf impact of batching! --------- Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com> Co-authored-by: Laurent Quérel <l.querel@f5.com>
#2395) # Change Summary This PR moves the continuous batch processor benchmarks to the 1ooklrps scenario and adds an otap-batch-otap configuration. I think the batch processor benchmarks were mistakenly added to the "passthrough" scenario which states it's for scenarios with no processor in the middle. The dashboard also does not seem to be set up properly for these and we want to add otap-batch-otap as mentioned here: #2246 (comment) - Closes #2277 Co-authored-by: albertlockett <a.lockett@f5.com>
Blocked on #2194
Trying to introduce batch processor to Perf tests, so as to catch ^ issues earlier. And also to actually measure the perf impact of batching!