Skip to content

Conversation

@jack-berg
Copy link
Member

Followup to #8000

@jack-berg jack-berg requested a review from a team as a code owner January 28, 2026 22:14

/**
* The number of record operations per benchmark invocation. By using a constant across benchmarks
* of different signals, it's easier to compare benchmark results across signals.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If span, metric, and log benchmarks all record the same number of operations per benchmark invocation, we can see the relative cost of spans vs. logs. vs. metrics. Even though it will never be a perfect apples to apples comparison, its still useful to know the order of magnitude cost of the different signals.

Make sense?

/**
* Notes on interpreting the data:
* This benchmark measures the performance of recording metrics and includes the following
* dimensions:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of my initial concerns with public benchmarks was that they need to be contextualized.

To address this, I'd like to:

  • Put some effort into making the javadoc for our public benchmarks up to date and useful
  • Update the benchmark static webpage to link to the relevant javadoc for each benchmark

* BatchSpanProcessor} paired with a noop {@link SpanExporter}. In order to avoid quickly outpacing
* the batch processor queue and dropping spans, the processor is configured with a queue size of
* {@link SpanRecordBenchmark#RECORDS_PER_INVOCATION} * {@link SpanRecordBenchmark#MAX_THREADS} and
* is flushed after each invocation.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a key aspect to a useful span record benchmark (and log record benchmark) IMO. We need to isolate from the export path, which is noisy due to the network dependency, while also being realistic. My definition of realistic is a batch span processor and a harness that makes sure that spans aren't just being dropped on the floor from a full queue.

}
}

public enum SpanSize {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check this out: if we have individual parameters for the num attributes, num events, num links, we end up with combinatorial explosion and a lot of noise. What we really want to characterize is the performance of different sizes of spans, where a size is a composite of a variety of dimensions.

@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.16%. Comparing base (87b0d9a) to head (fb8f290).

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #8031   +/-   ##
=========================================
  Coverage     90.16%   90.16%           
  Complexity     7484     7484           
=========================================
  Files           836      836           
  Lines         22562    22562           
  Branches       2237     2237           
=========================================
  Hits          20344    20344           
  Misses         1515     1515           
  Partials        703      703           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant