Skip to content

Add batchprocessor to perf tests#2246

Merged
jmacd merged 4 commits intoopen-telemetry:mainfrom
cijothomas:cijothomas/addbatch
Mar 13, 2026
Merged

Add batchprocessor to perf tests#2246
jmacd merged 4 commits intoopen-telemetry:mainfrom
cijothomas:cijothomas/addbatch

Conversation

@cijothomas
Copy link
Copy Markdown
Member

Blocked on #2194

Trying to introduce batch processor to Perf tests, so as to catch ^ issues earlier. And also to actually measure the perf impact of batching!

@cijothomas cijothomas requested a review from a team as a code owner March 10, 2026 01:55
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.54%. Comparing base (b31d4d1) to head (ddc1987).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2246      +/-   ##
==========================================
- Coverage   87.55%   87.54%   -0.01%     
==========================================
  Files         570      570              
  Lines      193480   193480              
==========================================
- Hits       169398   169385      -13     
- Misses      23556    23569      +13     
  Partials      526      526              
Components Coverage Δ
otap-dataflow 89.57% <100.00%> (-0.01%) ⬇️
query_abstraction 80.61% <ø> (ø)
query_engine 90.63% <ø> (ø)
syslog_cef_receivers ∅ <ø> (∅)
otel-arrow-go 52.44% <ø> (ø)
quiver 91.91% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@JakeDern
Copy link
Copy Markdown
Contributor

Are you hitting the issue described in #2194 when running these? I've actually been working on adding batch processor benchmarks too and whenever I've run them, they don't hit the limits with the current load generation setup

@JakeDern
Copy link
Copy Markdown
Contributor

This seems like as good a place as any to discuss benchmarks so I'm also curious about the scenarios that we want to test and whether to do continuous vs nightly. Just like with the current continuous benchmarks with the attribute processor, batching is sensitive to permutations across:

  • otap -> batch -> otap
  • otlp -> batch -> otlp
  • otap -> batch -> otlp
  • otlp -> batch -> otap

And across all three signal types as we take different code paths. My proposals would be:

  1. We should add batch processor benches to continuous as it's a critical core component and very sensitive to changes in terms of perf
  2. We should add otap -> otap and otlp -> otlp as a baseline to the continuous charts
  3. We should add the signal type as a dimension to the above matrix of scenarios because perf characteristics can and likely will be different across them

We may want to add the ability to filter by dimensions in the aggregated charts to keep them sane - Something like checkboxes for signal (logs/metrics/traces) and scenario (batch/baseline/otap).

CC: @lquerel @clhain for any thoughts!

@clhain
Copy link
Copy Markdown
Contributor

clhain commented Mar 10, 2026

Broadly agree with all that Jake.

I think generally it would be a good idea to re-visit the line between continuous and nightly in light of the many new engine capabilities and see how we can pack more tests in while still aiming to not monopolize the shared runner instance (e.g. we can probably avoid blowing away the backend/loadgenerator after each test, parallelizing some of them, running some with loadgen in-process, etc, etc).

I also agree that the chart layouts aren't really scaling well or providing an intuitive view of the world. Joe from F5 is looking at an alternative interface for comparing across different agents, which might be useful here when done.

The orchestration framework has the ability to output all of the internal test data as parquet files (high level summary as well as second by second metrics)... I have a vision where we replace the whole 'store a big blob of json and use some static html charts on top of it' with 'store a few parquet files and use the duckdb wasm plugin on top of those to provide a highly interactive view of test results over much longer time periods'. One of these days...

Obviously this is all just brain dump, no need to address as part of this PR =P

@JakeDern
Copy link
Copy Markdown
Contributor

Broadly agree with all that Jake.

I think generally it would be a good idea to re-visit the line between continuous and nightly in light of the many new engine capabilities and see how we can pack more tests in while still aiming to not monopolize the shared runner instance (e.g. we can probably avoid blowing away the backend/loadgenerator after each test, parallelizing some of them, running some with loadgen in-process, etc, etc).

I also agree that the chart layouts aren't really scaling well or providing an intuitive view of the world. Joe from F5 is looking at an alternative interface for comparing across different agents, which might be useful here when done.

The orchestration framework has the ability to output all of the internal test data as parquet files (high level summary as well as second by second metrics)... I have a vision where we replace the whole 'store a big blob of json and use some static html charts on top of it' with 'store a few parquet files and use the duckdb wasm plugin on top of those to provide a highly interactive view of test results over much longer time periods'. One of these days...

Obviously this is all just brain dump, no need to address as part of this PR =P

Totally makes sense! I think it probably wouldn't be too hard to get a couple of checkboxes for signal/scenario dimensions into the continuous benchmark html charts (famous last words) in the meantime. I'm motivated to get some good batch processor benches up and running in the near term so I'm happy to employ some robot helpers and see if I can hack that together if that's the direction we want to go.

If there's concerns about monopolizing the runner because the matrix is too big though also totally understand that. The proposal would be (batch(4) + baseline(2) + attr(4)) * signals (3) = 30 runs vs the 4 on that continuous chart today.

We can alternatively start smaller and just add the batch + baseline scenarios into the chart (6 more scenarios) and not multiply across signals until we make other changes to use the shared runner more efficiently.

@JakeDern
Copy link
Copy Markdown
Contributor

I could also experiment with the larger matrix as a nightly and we can see how we like it and then add a smaller additional set to the continuous

@clhain
Copy link
Copy Markdown
Contributor

clhain commented Mar 10, 2026

Glancing at it now, the current setup is taking anywhere from 30min to 1hr+ (!), and there's like 5 runs queued up... Maybe it's mostly time spent building, but I'd say start with nightly and we need to get those numbers way down before we start multiplying anything =P

@clhain
Copy link
Copy Markdown
Contributor

clhain commented Mar 10, 2026

Glancing at it now, the current setup is taking anywhere from 30min to 1hr+ (!), and there's like 5 runs queued up... Maybe it's mostly time spent building, but I'd say start with nightly and we need to get those numbers way down before we start multiplying anything =P

ah nvm that 1hr+ must include queue time... looks like they mostly finish in ~30 min (breakdown: 4m build, 16+3+4m for logs, idle, passthrough - this seems high).

@JakeDern
Copy link
Copy Markdown
Contributor

That's a decent amount of time with the rate of PRs we have! How do we feel about adding just otap-batch-otap and otlp-batch-otlp to the continuous to give us something and have everything else be nightly? Maybe it's still too much given the queues we have...

@github-actions github-actions Bot added the rust Pull requests that update Rust code label Mar 10, 2026
@cijothomas
Copy link
Copy Markdown
Member Author

That's a decent amount of time with the rate of PRs we have! How do we feel about adding just otap-batch-otap and otlp-batch-otlp to the continuous to give us something and have everything else be nightly? Maybe it's still too much given the queues we have...

Majority of the tests can be pushed to Nightly. I started with continuous to have it running frequently and adjust settings based on the runs. The plan was to always move them to nightly.

I am also working on #1528 to make sure all the things we test/publish about perf is captured.

@clhain
Copy link
Copy Markdown
Contributor

clhain commented Mar 10, 2026

I filed this, looks like something is broken with shutdown calls: #2257

That should drop the 16 minute one down to like 4 when fixed if someone has time to investigate.

I definitely support having some batch testing in continuous (seems more useful than ATTR in any case), starting with the basic 2 and the rest in nightly sounds like a solid plan for now.


const fn default_otap_sizer_items() -> Sizer {
Sizer::Bytes
Sizer::Items
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! (How did ...)

@jmacd jmacd enabled auto-merge March 12, 2026 20:20
@jmacd jmacd added this pull request to the merge queue Mar 12, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Mar 12, 2026
@jmacd jmacd added this pull request to the merge queue Mar 13, 2026
Merged via the queue into open-telemetry:main with commit bde436e Mar 13, 2026
67 checks passed
@cijothomas cijothomas deleted the cijothomas/addbatch branch March 13, 2026 02:42
cijothomas added a commit to cijothomas/otel-arrow that referenced this pull request Mar 17, 2026
Blocked on open-telemetry#2194

Trying to introduce batch processor to Perf tests, so as to catch ^
issues earlier. And also to actually measure the perf impact of
batching!

---------

Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>
Co-authored-by: Laurent Quérel <l.querel@f5.com>
github-merge-queue Bot pushed a commit that referenced this pull request Mar 23, 2026
#2395)

# Change Summary

This PR moves the continuous batch processor benchmarks to the 1ooklrps
scenario and adds an otap-batch-otap configuration.

I think the batch processor benchmarks were mistakenly added to the
"passthrough" scenario which states it's for scenarios with no processor
in the middle. The dashboard also does not seem to be set up properly
for these and we want to add otap-batch-otap as mentioned here:
#2246 (comment)

- Closes #2277

Co-authored-by: albertlockett <a.lockett@f5.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants