-
Notifications
You must be signed in to change notification settings - Fork 937
Rework span record benchmark and publish results #8031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| /* | ||
| * Copyright The OpenTelemetry Authors | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| */ | ||
|
|
||
| package io.opentelemetry.sdk; | ||
|
|
||
| public class BenchmarkUtils { | ||
|
|
||
| /** | ||
| * The number of record operations per benchmark invocation. By using a constant across benchmarks | ||
| * of different signals, it's easier to compare benchmark results across signals. | ||
| */ | ||
| public static final int RECORDS_PER_INVOCATION = 1024 * 10; | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -43,15 +43,28 @@ | |
| import org.openjdk.jmh.annotations.Warmup; | ||
|
|
||
| /** | ||
| * Notes on interpreting the data: | ||
| * This benchmark measures the performance of recording metrics and includes the following | ||
| * dimensions: | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One of my initial concerns with public benchmarks was that they need to be contextualized. To address this, I'd like to:
|
||
| * | ||
| * <p>The benchmark has two dimensions which partially overlap: cardinality and thread count. | ||
| * Cardinality dictates how many unique attribute sets (i.e. series) are recorded to, and thread | ||
| * count dictates how many threads are simultaneously recording to those series. In all cases, the | ||
| * record path needs to look up an aggregation handle for the series corresponding to the | ||
| * measurement's {@link Attributes} in a {@link java.util.concurrent.ConcurrentHashMap}. That will | ||
| * be the case until otel adds support for <a | ||
| * href="https://github.com/open-telemetry/opentelemetry-specification/issues/4126">bound | ||
| * <ul> | ||
| * <li>{@link BenchmarkState#instrumentTypeAndAggregation} composite of {@link InstrumentType} and | ||
| * {@link Aggregation}, including all relevant combinations for synchronous instruments. | ||
| * <li>{@link BenchmarkState#aggregationTemporality} | ||
| * <li>{@link BenchmarkState#cardinality} | ||
| * <li>thread count | ||
| * <li>{@link BenchmarkState#instrumentValueType}, {@link BenchmarkState#memoryMode}, and {@link | ||
| * BenchmarkState#exemplars} are disabled to reduce combinatorial explosion. | ||
| * </ul> | ||
| * | ||
| * <p>Each operation consists of recording {@link MetricRecordBenchmark#RECORDS_PER_INVOCATION} | ||
| * measurements. | ||
| * | ||
| * <p>The cardinality and thread count dimensions partially overlap. Cardinality dictates how many | ||
| * unique attribute sets (i.e. series) are recorded to, and thread count dictates how many threads | ||
| * are simultaneously recording to those series. In all cases, the record path needs to look up an | ||
| * aggregation handle for the series corresponding to the measurement's {@link Attributes} in a | ||
| * {@link java.util.concurrent.ConcurrentHashMap}. That will be the case until otel adds support for | ||
| * <a href="https://github.com/open-telemetry/opentelemetry-specification/issues/4126">bound | ||
| * instruments</a>. The cardinality dictates the size of this map, which has some impact on | ||
| * performance. However, by far the dominant bottleneck is contention. That is, the number of | ||
| * threads simultaneously trying to record to the same series. Increasing the threads increases | ||
|
|
@@ -72,10 +85,11 @@ | |
| public class MetricRecordBenchmark { | ||
|
|
||
| private static final int INITIAL_SEED = 513423236; | ||
| private static final int RECORD_COUNT = 10 * 1024; | ||
| private static final int MAX_THREADS = 4; | ||
| private static final int RECORDS_PER_INVOCATION = BenchmarkUtils.RECORDS_PER_INVOCATION; | ||
|
|
||
| @State(Scope.Benchmark) | ||
| public static class ThreadState { | ||
| public static class BenchmarkState { | ||
|
|
||
| @Param InstrumentTypeAndAggregation instrumentTypeAndAggregation; | ||
|
|
||
|
|
@@ -154,8 +168,8 @@ public void setup() { | |
| } | ||
| Collections.shuffle(attributesList); | ||
|
|
||
| measurements = new ArrayList<>(RECORD_COUNT); | ||
| for (int i = 0; i < RECORD_COUNT; i++) { | ||
| measurements = new ArrayList<>(RECORDS_PER_INVOCATION); | ||
| for (int i = 0; i < RECORDS_PER_INVOCATION; i++) { | ||
| measurements.add((long) random.nextInt(2000)); | ||
| } | ||
| Collections.shuffle(measurements); | ||
|
|
@@ -175,25 +189,26 @@ public void tearDown() { | |
| @Fork(1) | ||
| @Warmup(iterations = 5, time = 1) | ||
| @Measurement(iterations = 5, time = 1) | ||
| public void record_1Thread(ThreadState threadState) { | ||
| record(threadState); | ||
| public void record_SingleThread(BenchmarkState benchmarkState) { | ||
| record(benchmarkState); | ||
| } | ||
|
|
||
| @Benchmark | ||
| @Group("threads4") | ||
| @GroupThreads(4) | ||
| @Group("threads" + MAX_THREADS) | ||
| @GroupThreads(MAX_THREADS) | ||
| @Fork(1) | ||
| @Warmup(iterations = 5, time = 1) | ||
| @Measurement(iterations = 5, time = 1) | ||
| public void record_4Threads(ThreadState threadState) { | ||
| record(threadState); | ||
| public void record_MultipleThreads(BenchmarkState benchmarkState) { | ||
| record(benchmarkState); | ||
| } | ||
|
|
||
| private static void record(ThreadState threadState) { | ||
| for (int i = 0; i < RECORD_COUNT; i++) { | ||
| Attributes attributes = threadState.attributesList.get(i % threadState.attributesList.size()); | ||
| long value = threadState.measurements.get(i % threadState.measurements.size()); | ||
| threadState.instrument.record(value, attributes); | ||
| private static void record(BenchmarkState benchmarkState) { | ||
| for (int i = 0; i < RECORDS_PER_INVOCATION; i++) { | ||
| Attributes attributes = | ||
| benchmarkState.attributesList.get(i % benchmarkState.attributesList.size()); | ||
| long value = benchmarkState.measurements.get(i % benchmarkState.measurements.size()); | ||
| benchmarkState.instrument.record(value, attributes); | ||
| } | ||
| } | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,174 @@ | ||
| /* | ||
| * Copyright The OpenTelemetry Authors | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| */ | ||
|
|
||
| package io.opentelemetry.sdk; | ||
|
|
||
| import io.opentelemetry.api.common.AttributeKey; | ||
| import io.opentelemetry.api.trace.Span; | ||
| import io.opentelemetry.api.trace.SpanContext; | ||
| import io.opentelemetry.api.trace.TraceFlags; | ||
| import io.opentelemetry.api.trace.TraceState; | ||
| import io.opentelemetry.api.trace.Tracer; | ||
| import io.opentelemetry.sdk.trace.IdGenerator; | ||
| import io.opentelemetry.sdk.trace.SdkTracerProvider; | ||
| import io.opentelemetry.sdk.trace.export.BatchSpanProcessor; | ||
| import io.opentelemetry.sdk.trace.export.SpanExporter; | ||
| import io.opentelemetry.sdk.trace.samplers.Sampler; | ||
| import java.util.ArrayList; | ||
| import java.util.List; | ||
| import java.util.concurrent.TimeUnit; | ||
| import org.openjdk.jmh.annotations.Benchmark; | ||
| import org.openjdk.jmh.annotations.Fork; | ||
| import org.openjdk.jmh.annotations.Group; | ||
| import org.openjdk.jmh.annotations.GroupThreads; | ||
| import org.openjdk.jmh.annotations.Level; | ||
| import org.openjdk.jmh.annotations.Measurement; | ||
| import org.openjdk.jmh.annotations.Param; | ||
| import org.openjdk.jmh.annotations.Scope; | ||
| import org.openjdk.jmh.annotations.Setup; | ||
| import org.openjdk.jmh.annotations.State; | ||
| import org.openjdk.jmh.annotations.TearDown; | ||
| import org.openjdk.jmh.annotations.Warmup; | ||
|
|
||
| /** | ||
| * This benchmark measures the performance of recording spans and includes the following dimensions: | ||
| * | ||
| * <ul> | ||
| * <li>{@link BenchmarkState#spanSize}: the size of the span, which is a composite of the number | ||
| * of attributes, events, and links attached to the span. | ||
| * </ul> | ||
| * | ||
| * <p>Each operation consists of recording {@link SpanRecordBenchmark#RECORDS_PER_INVOCATION} spans. | ||
| * | ||
| * <p>In order to isolate the record path while remaining realistic, the benchmark uses a {@link | ||
| * BatchSpanProcessor} paired with a noop {@link SpanExporter}. In order to avoid quickly outpacing | ||
| * the batch processor queue and dropping spans, the processor is configured with a queue size of | ||
| * {@link SpanRecordBenchmark#RECORDS_PER_INVOCATION} * {@link SpanRecordBenchmark#MAX_THREADS} and | ||
| * is flushed after each invocation. | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a key aspect to a useful span record benchmark (and log record benchmark) IMO. We need to isolate from the export path, which is noisy due to the network dependency, while also being realistic. My definition of realistic is a batch span processor and a harness that makes sure that spans aren't just being dropped on the floor from a full queue. |
||
| */ | ||
| public class SpanRecordBenchmark { | ||
|
|
||
| private static final int RECORDS_PER_INVOCATION = BenchmarkUtils.RECORDS_PER_INVOCATION; | ||
| private static final int MAX_THREADS = 4; | ||
| private static final int QUEUE_SIZE = RECORDS_PER_INVOCATION * MAX_THREADS; | ||
|
|
||
| @State(Scope.Benchmark) | ||
| public static class BenchmarkState { | ||
|
|
||
| // Encode a variety of dimensions (# attributes, # events, # links) into a single enum to | ||
| // benchmark various shapes of spans without combinatorial explosion. | ||
| @Param SpanSize spanSize; | ||
|
|
||
| SdkTracerProvider tracerProvider; | ||
| Tracer tracer; | ||
| List<AttributeKey<String>> attributeKeys; | ||
| List<String> attributeValues; | ||
| List<Exception> exceptions; | ||
| List<SpanContext> linkContexts; | ||
|
|
||
| @Setup | ||
| public void setup() { | ||
| tracerProvider = | ||
| SdkTracerProvider.builder() | ||
| // Configure a batch processor with a noop exporter (SpanExporter.composite() is a | ||
| // shortcut for a noop exporter). This allows testing the throughput / performance | ||
| // impact of BatchSpanProcessor, which is essential for real workloads, while avoiding | ||
| // noise from SpanExporters whose performance is subject to implementation and network | ||
| // details. | ||
| .addSpanProcessor( | ||
| BatchSpanProcessor.builder(SpanExporter.composite()) | ||
| .setMaxQueueSize(QUEUE_SIZE) | ||
| .build()) | ||
| .setSampler(Sampler.alwaysOn()) | ||
| .build(); | ||
| tracer = tracerProvider.get("benchmarkTracer"); | ||
|
|
||
| attributeKeys = new ArrayList<>(spanSize.attributes); | ||
| attributeValues = new ArrayList<>(spanSize.attributes); | ||
| for (int i = 0; i < spanSize.attributes; i++) { | ||
| attributeKeys.add(AttributeKey.stringKey("key" + i)); | ||
| attributeValues.add("value" + i); | ||
| } | ||
|
|
||
| exceptions = new ArrayList<>(spanSize.events); | ||
| for (int i = 0; i < spanSize.events; i++) { | ||
| exceptions.add(new Exception("test exception")); | ||
| } | ||
|
|
||
| linkContexts = new ArrayList<>(spanSize.links); | ||
| for (int i = 0; i < spanSize.links; i++) { | ||
| linkContexts.add( | ||
| SpanContext.create( | ||
| IdGenerator.random().generateTraceId(), | ||
| IdGenerator.random().generateSpanId(), | ||
| TraceFlags.getDefault(), | ||
| TraceState.getDefault())); | ||
| } | ||
| } | ||
|
|
||
| @TearDown(Level.Invocation) | ||
| public void flush() { | ||
| tracerProvider.forceFlush().join(10, TimeUnit.SECONDS); | ||
| } | ||
|
|
||
| @TearDown | ||
| public void tearDown() { | ||
| tracerProvider.shutdown(); | ||
| } | ||
| } | ||
|
|
||
| @Benchmark | ||
| @Group("threads1") | ||
| @GroupThreads(1) | ||
| @Fork(1) | ||
| @Warmup(iterations = 5, time = 1) | ||
| @Measurement(iterations = 5, time = 1) | ||
| public void record_SingleThread(BenchmarkState benchmarkState) { | ||
| record(benchmarkState); | ||
| } | ||
|
|
||
| @Benchmark | ||
| @Group("threads" + MAX_THREADS) | ||
| @GroupThreads(MAX_THREADS) | ||
| @Fork(1) | ||
| @Warmup(iterations = 5, time = 1) | ||
| @Measurement(iterations = 5, time = 1) | ||
| public void record_MultipleThreads(BenchmarkState benchmarkState) { | ||
| record(benchmarkState); | ||
| } | ||
|
|
||
| private static void record(BenchmarkState benchmarkState) { | ||
| for (int i = 0; i < RECORDS_PER_INVOCATION; i++) { | ||
| Span span = benchmarkState.tracer.spanBuilder("test span name").startSpan(); | ||
| for (int j = 0; j < benchmarkState.attributeKeys.size(); j++) { | ||
| span.setAttribute( | ||
| benchmarkState.attributeKeys.get(j), benchmarkState.attributeValues.get(j)); | ||
| } | ||
| for (int j = 0; j < benchmarkState.exceptions.size(); j++) { | ||
| span.recordException(benchmarkState.exceptions.get(j)); | ||
| } | ||
| for (int j = 0; j < benchmarkState.linkContexts.size(); j++) { | ||
| span.addLink(benchmarkState.linkContexts.get(j)); | ||
| } | ||
| span.end(); | ||
| } | ||
| } | ||
|
|
||
| public enum SpanSize { | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Check this out: if we have individual parameters for the num attributes, num events, num links, we end up with combinatorial explosion and a lot of noise. What we really want to characterize is the performance of different sizes of spans, where a size is a composite of a variety of dimensions. |
||
| SMALL(0, 0, 0), | ||
| MEDIUM(10, 1, 0), | ||
| LARGE(100, 10, 5); | ||
|
|
||
| private final int attributes; | ||
| private final int events; | ||
| private final int links; | ||
|
|
||
| SpanSize(int attributes, int events, int links) { | ||
| this.attributes = attributes; | ||
| this.events = events; | ||
| this.links = links; | ||
| } | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If span, metric, and log benchmarks all record the same number of operations per benchmark invocation, we can see the relative cost of spans vs. logs. vs. metrics. Even though it will never be a perfect apples to apples comparison, its still useful to know the order of magnitude cost of the different signals.
Make sense?