Add batchprocessor to perf tests by cijothomas · Pull Request #2246 · open-telemetry/otel-arrow

cijothomas · 2026-03-10T01:55:27Z

Blocked on #2194

Trying to introduce batch processor to Perf tests, so as to catch ^ issues earlier. And also to actually measure the perf impact of batching!

codecov · 2026-03-10T01:59:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.54%. Comparing base (b31d4d1) to head (ddc1987).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2246      +/-   ##
==========================================
- Coverage   87.55%   87.54%   -0.01%     
==========================================
  Files         570      570              
  Lines      193480   193480              
==========================================
- Hits       169398   169385      -13     
- Misses      23556    23569      +13     
  Partials      526      526

Components	Coverage Δ
otap-dataflow	`89.57% <100.00%> (-0.01%)`	⬇️
query_abstraction	`80.61% <ø> (ø)`
query_engine	`90.63% <ø> (ø)`
syslog_cef_receivers	`∅ <ø> (∅)`
otel-arrow-go	`52.44% <ø> (ø)`
quiver	`91.91% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

JakeDern · 2026-03-10T17:04:31Z

Are you hitting the issue described in #2194 when running these? I've actually been working on adding batch processor benchmarks too and whenever I've run them, they don't hit the limits with the current load generation setup

JakeDern · 2026-03-10T17:33:23Z

This seems like as good a place as any to discuss benchmarks so I'm also curious about the scenarios that we want to test and whether to do continuous vs nightly. Just like with the current continuous benchmarks with the attribute processor, batching is sensitive to permutations across:

otap -> batch -> otap
otlp -> batch -> otlp
otap -> batch -> otlp
otlp -> batch -> otap

And across all three signal types as we take different code paths. My proposals would be:

We should add batch processor benches to continuous as it's a critical core component and very sensitive to changes in terms of perf
We should add otap -> otap and otlp -> otlp as a baseline to the continuous charts
We should add the signal type as a dimension to the above matrix of scenarios because perf characteristics can and likely will be different across them

We may want to add the ability to filter by dimensions in the aggregated charts to keep them sane - Something like checkboxes for signal (logs/metrics/traces) and scenario (batch/baseline/otap).

CC: @lquerel @clhain for any thoughts!

clhain · 2026-03-10T18:12:14Z

Broadly agree with all that Jake.

I think generally it would be a good idea to re-visit the line between continuous and nightly in light of the many new engine capabilities and see how we can pack more tests in while still aiming to not monopolize the shared runner instance (e.g. we can probably avoid blowing away the backend/loadgenerator after each test, parallelizing some of them, running some with loadgen in-process, etc, etc).

I also agree that the chart layouts aren't really scaling well or providing an intuitive view of the world. Joe from F5 is looking at an alternative interface for comparing across different agents, which might be useful here when done.

The orchestration framework has the ability to output all of the internal test data as parquet files (high level summary as well as second by second metrics)... I have a vision where we replace the whole 'store a big blob of json and use some static html charts on top of it' with 'store a few parquet files and use the duckdb wasm plugin on top of those to provide a highly interactive view of test results over much longer time periods'. One of these days...

Obviously this is all just brain dump, no need to address as part of this PR =P

JakeDern · 2026-03-10T18:34:59Z

Broadly agree with all that Jake.

I think generally it would be a good idea to re-visit the line between continuous and nightly in light of the many new engine capabilities and see how we can pack more tests in while still aiming to not monopolize the shared runner instance (e.g. we can probably avoid blowing away the backend/loadgenerator after each test, parallelizing some of them, running some with loadgen in-process, etc, etc).

I also agree that the chart layouts aren't really scaling well or providing an intuitive view of the world. Joe from F5 is looking at an alternative interface for comparing across different agents, which might be useful here when done.

The orchestration framework has the ability to output all of the internal test data as parquet files (high level summary as well as second by second metrics)... I have a vision where we replace the whole 'store a big blob of json and use some static html charts on top of it' with 'store a few parquet files and use the duckdb wasm plugin on top of those to provide a highly interactive view of test results over much longer time periods'. One of these days...

Obviously this is all just brain dump, no need to address as part of this PR =P

Totally makes sense! I think it probably wouldn't be too hard to get a couple of checkboxes for signal/scenario dimensions into the continuous benchmark html charts (famous last words) in the meantime. I'm motivated to get some good batch processor benches up and running in the near term so I'm happy to employ some robot helpers and see if I can hack that together if that's the direction we want to go.

If there's concerns about monopolizing the runner because the matrix is too big though also totally understand that. The proposal would be (batch(4) + baseline(2) + attr(4)) * signals (3) = 30 runs vs the 4 on that continuous chart today.

We can alternatively start smaller and just add the batch + baseline scenarios into the chart (6 more scenarios) and not multiply across signals until we make other changes to use the shared runner more efficiently.

JakeDern · 2026-03-10T18:36:29Z

I could also experiment with the larger matrix as a nightly and we can see how we like it and then add a smaller additional set to the continuous

clhain · 2026-03-10T18:41:42Z

Glancing at it now, the current setup is taking anywhere from 30min to 1hr+ (!), and there's like 5 runs queued up... Maybe it's mostly time spent building, but I'd say start with nightly and we need to get those numbers way down before we start multiplying anything =P

clhain · 2026-03-10T18:50:03Z

Glancing at it now, the current setup is taking anywhere from 30min to 1hr+ (!), and there's like 5 runs queued up... Maybe it's mostly time spent building, but I'd say start with nightly and we need to get those numbers way down before we start multiplying anything =P

ah nvm that 1hr+ must include queue time... looks like they mostly finish in ~30 min (breakdown: 4m build, 16+3+4m for logs, idle, passthrough - this seems high).

JakeDern · 2026-03-10T18:51:00Z

That's a decent amount of time with the rate of PRs we have! How do we feel about adding just otap-batch-otap and otlp-batch-otlp to the continuous to give us something and have everything else be nightly? Maybe it's still too much given the queues we have...

cijothomas · 2026-03-10T18:53:44Z

That's a decent amount of time with the rate of PRs we have! How do we feel about adding just otap-batch-otap and otlp-batch-otlp to the continuous to give us something and have everything else be nightly? Maybe it's still too much given the queues we have...

Majority of the tests can be pushed to Nightly. I started with continuous to have it running frequently and adjust settings based on the runs. The plan was to always move them to nightly.

I am also working on #1528 to make sure all the things we test/publish about perf is captured.

clhain · 2026-03-10T19:02:52Z

I filed this, looks like something is broken with shutdown calls: #2257

That should drop the 16 minute one down to like 4 when fixed if someone has time to investigate.

I definitely support having some batch testing in continuous (seems more useful than ATTR in any case), starting with the basic 2 and the rest in nightly sounds like a solid plan for now.

jmacd · 2026-03-12T20:19:09Z


 const fn default_otap_sizer_items() -> Sizer {
-    Sizer::Bytes
+    Sizer::Items


Yes! (How did ...)

Blocked on open-telemetry#2194 Trying to introduce batch processor to Perf tests, so as to catch ^ issues earlier. And also to actually measure the perf impact of batching! --------- Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com> Co-authored-by: Laurent Quérel <l.querel@f5.com>

#2395) # Change Summary This PR moves the continuous batch processor benchmarks to the 1ooklrps scenario and adds an otap-batch-otap configuration. I think the batch processor benchmarks were mistakenly added to the "passthrough" scenario which states it's for scenarios with no processor in the middle. The dashboard also does not seem to be set up properly for these and we want to add otap-batch-otap as mentioned here: #2246 (comment) - Closes #2277 Co-authored-by: albertlockett <a.lockett@f5.com>

Add batchprocessor to perf tests

e3fdedc

cijothomas requested a review from a team as a code owner March 10, 2026 01:55

github-project-automation Bot added this to OTel-Arrow Mar 10, 2026

keep batch in the mix

a752406

github-actions Bot added the rust Pull requests that update Rust code label Mar 10, 2026

This was referenced Mar 11, 2026

Batch processor nightly benchmarks #2278

Closed

Batch processor continuous benchmarks #2277

Closed

jmacd approved these changes Mar 12, 2026

View reviewed changes

Merge branch 'main' into cijothomas/addbatch

3ea1c26

jmacd enabled auto-merge March 12, 2026 20:20

jmacd added this pull request to the merge queue Mar 12, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Mar 12, 2026

Merge branch 'main' into cijothomas/addbatch

ddc1987

jmacd added this pull request to the merge queue Mar 13, 2026

Merged via the queue into open-telemetry:main with commit bde436e Mar 13, 2026
67 checks passed

github-project-automation Bot moved this to Done in OTel-Arrow Mar 13, 2026

cijothomas deleted the cijothomas/addbatch branch March 13, 2026 02:42

JakeDern mentioned this pull request Mar 20, 2026

fix: Move batch processor to 100klrps scenario and add otap-batch-otap #2395

Merged

Conversation

cijothomas commented Mar 10, 2026

Uh oh!

codecov Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JakeDern commented Mar 10, 2026

Uh oh!

JakeDern commented Mar 10, 2026

Uh oh!

clhain commented Mar 10, 2026

Uh oh!

JakeDern commented Mar 10, 2026

Uh oh!

JakeDern commented Mar 10, 2026

Uh oh!

clhain commented Mar 10, 2026

Uh oh!

clhain commented Mar 10, 2026

Uh oh!

JakeDern commented Mar 10, 2026

Uh oh!

cijothomas commented Mar 10, 2026

Uh oh!

clhain commented Mar 10, 2026

Uh oh!

jmacd Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov Bot commented Mar 10, 2026 •

edited

Loading