Add development maxExportBatchSize configuration to Periodic MetricReader by dashpole · Pull Request #4895 · open-telemetry/opentelemetry-specification

dashpole · 2026-02-18T21:54:17Z

Prior Art

The Trace SDK and Logging SDK both support a maxExportBatchSize parameter to limit the number of spans/logs exported in a batch. The collector's exporter helper and batch processor support a send_batch_max_size configuration option, which (by default) applies to the number spans, logs, or metric data points. In all cases, the configured timeout applies to a single request.

Requirements

Apply a limit to the number of metric data points exported in a single OTLP batch.
Maintain existing ordering of metric data points. Batching must not result in metric data from a subsequent Collect to be exported prior to data from the earlier Collect call.
Apply the timeout to individual requests, not to multiple requests
The batch size must apply to a single exporter, and if multiple exporters are used, each must be able to have its own batch size.

Non-goals

Introduce any parallelism into the metric export path
Limit by bytes, or anything else

Proposal

Add maxExportBatchSize to the periodic exporting MetricReader. The periodic exporting MetricReader splits the batch of metric data points received from Collect, if necessary, and then serially invokes Export on each split batch with the configured timeout.

Alternatives considered

maxExportBatchSize for all MetricReaders

Instead of applying to only periodic readers, the batch size could apply to all readers. This alternative is not chosen because

Splitting batches is only required for push exporters.
It makes more sense to group the batching configuration with timeout configuration (which is on the periodic exporting MetricReader).

maxExportBatchSize on OTLP exporters

Instead of being on the periodic exporting MetricReader, we could add this configuration on the OTLP http and grpc exporters. This alternative is not chosen because:

The timeout should apply to individual batches, not to many, split batches in order to match behavior of other SDKs and the collector. This is only possible if batches are split before the exporter, since the Periodic MetricReader applies the timeout.
It is more helpful to provide this functionality for all exporters, so it doesn't need to be re-implemented or copied.

Prototypes:

Go: PoC for batching in PeriodicReader opentelemetry-go#7930
Links to the prototypes (when adding or changing features)
CHANGELOG.md file updated for non-trivial changes
Spec compliance matrix updated if necessary

specification/metrics/sdk.md

cijothomas · 2026-02-24T16:45:16Z

@dashpole
From the PR desc, a requirement is "Maintain ordering of metric data points", but the proposed changes does not reflect this wording. Is that a hard requirement?

dashpole · 2026-02-24T16:47:14Z

I tried to capture that requirement here:

The reader MUST ensure all metric data points from a single
Collect() are provided to Export before metric data points from a
subsequent Collect() so that metric points are sent in-order.

dashpole · 2026-02-24T16:49:45Z

cc @open-telemetry/specs-metrics-approvers

cijothomas · 2026-02-24T16:50:34Z

I tried to capture that requirement here:

The reader MUST ensure all metric data points from a single
Collect() are provided to Export before metric data points from a
subsequent Collect() so that metric points are sent in-order.

Got it. I was assuming the ordering within the collect, given incoming spec on per-series timestamps.

dashpole · 2026-02-24T16:54:13Z

I rephrased the requirement to Maintain existing ordering of metric data points. Batching must not result in metric data from a subsequent Collect to be exported prior to data from the earlier Collect call.

cijothomas · 2026-02-24T16:54:43Z

@rajkumar-rangaraj Please review to see if it is feasible to implement this efficiently in .NET. This would be useful in the ETW/User-Events metric exporter, which has a hard 64 KB limit, and it currently achieves this by writing each metric point in one call. While maxExportBatchSize does not give bytes-size based splitting, metric point count is a reasonable proxy that can be leveraged.

@utpilla @lalitb Same for Rust etw/user-event exporters.

specification/metrics/sdk.md

rajkumar-rangaraj · 2026-02-25T02:31:18Z

@rajkumar-rangaraj Please review to see if it is feasible to implement this efficiently in .NET. This would be useful in the ETW/User-Events metric exporter, which has a hard 64 KB limit, and it currently achieves this by writing each metric point in one call. While maxExportBatchSize does not give bytes-size based splitting, metric point count is a reasonable proxy that can be leveraged.

@cijothomas you are correct this would help in the ETW/User-Events exporter scenario. This is feasible to implement in .NET. We already have maxExportBatchSize on BatchExportProcessorOptions<T> for traces and logs, so adding it to the periodic MetricReader aligns the API surface across all signals.

reyang · 2026-02-25T02:55:14Z

I support the direction of this PR. There are couple places we need to consider before I give my approval:

The current spec https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#shutdown-2 says "After the call to Shutdown subsequent calls to Export are not allowed and should return a Failure result.". Now that we are allowing the MetricReader to make multiple Export calls, I think the semantic should change to "if you already started the first Export(batch), you should continue exporting all the remaining batches, and only stop for the next Collect cycle".
We also need to consider the potential changes for ForceFlush because the current spec https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#forceflush-2 says "This is a hint to ensure that the export of any Metrics the exporter has received prior to the call to ForceFlush SHOULD be completed as soon as possible". I think we need to extend this to allow the exporter to receive the remaining batches which belong to the current collection cycle. I think the semantic should be "Provider.ForceFlush will notify the push MetricReader, the MetricReader will need to do take care ForceFlush for all the batches within a Collect cycle".

…tricReader

dashpole · 2026-02-25T17:35:06Z

"After the call to Shutdown subsequent calls to Export are not allowed and should return a Failure result.". Now that we are allowing the MetricReader to make multiple Export calls, I think the semantic should change to "if you already started the first Export(batch), you should continue exporting all the remaining batches, and only stop for the next Collect cycle".

Are you reading this as "After the call to Shutdown has begun...", or "After the call to Shutdown has completed..."? I had read it as the latter since Shutdown is expected to make export calls. Export for split batches are not allowed to be interleaved, so I don't think there can be remaining batches after Shutdown has completed. Edit: I was looking at the wrong shutdown method.

It seems like the Periodic MetricReader should not Shutdown the exporters until after it has exported all batches it needs to? I'm not sure the shutdown method for an exporter would need to change here.

I think we need to extend this to allow the exporter to receive the remaining batches which belong to the current collection cycle

Again, I think this is the Periodic MetricReader's responsibility to properly order calls to the exporter. If the user calls ForceFlush on the MeterProvider, I would expect it to first collect, then batch and export, and then call force flush.

Part of what is confusing here is that the spec seems to suggest that push exporters can call collect?

This method provides a way for provider to notify the registered MetricReader instances that have an associated Push Metric Exporter, so they can do as much as they could to collect and send the metrics.

But the Periodic MetricReader spec seems pretty clear that it is what is supposed to do the collecting...

reyang · 2026-02-25T19:00:51Z

"After the call to Shutdown subsequent calls to Export are not allowed and should return a Failure result.". Now that we are allowing the MetricReader to make multiple Export calls, I think the semantic should change to "if you already started the first Export(batch), you should continue exporting all the remaining batches, and only stop for the next Collect cycle".

Are you reading this as "After the call to Shutdown has begun...", or "After the call to Shutdown has completed..."? I had read it as the latter since Shutdown is expected to make export calls. Export for split batches are not allowed to be interleaved, so I don't think there can be remaining batches after Shutdown has completed. Edit: I was looking at the wrong shutdown method.

It seems like the Periodic MetricReader should not Shutdown the exporters until after it has exported all batches it needs to? I'm not sure the shutdown method for an exporter would need to change here.

I think we need to extend this to allow the exporter to receive the remaining batches which belong to the current collection cycle

Again, I think this is the Periodic MetricReader's responsibility to properly order calls to the exporter. If the user calls ForceFlush on the MeterProvider, I would expect it to first collect, then batch and export, and then call force flush.

Right, and we need to update the spec. Let me give a concrete example, https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#forceflush-1 says "ForceFlush SHOULD collect metrics, call Export(batch) and ForceFlush() on the configured Push Metric Exporter.", this needs to be updated as we're going to make multiple calls to Export(batch1) Export(batch2) ... and in the end ForceFlush(). If we hit a timeout, what should we do?

Part of what is confusing here is that the spec seems to suggest that push exporters can call collect?

Where do you see that? Paste the specific wording here?

dashpole · 2026-02-25T19:18:04Z

Got it. I looked through all the shutdown/forceflush specs, and I think the only one that needs updating is the MetricReader.ForceFlush one you identified: a0403e9.

dashpole · 2026-02-25T19:20:30Z

Where do you see that? Paste the specific wording here?

I was reading the wrong ForceFlush again :)

dashpole · 2026-02-25T19:24:44Z

If we hit a timeout, what should we do?

Since the export calls are serial, the reader should handle it the same way they currently do (probably return a timeout error). I think this is covered by existing language:

ForceFlush SHOULD complete or abort within some timeout.

reyang · 2026-02-25T21:05:04Z

If we hit a timeout, what should we do?

Since the export calls are serial, the reader should handle it the same way they currently do (probably return a timeout error). I think this is covered by existing language:

ForceFlush SHOULD complete or abort within some timeout.

If reader ForceFlush is reaching the time limit, it SHOULD not try to call exporter.Export for the remaining batches, and make a final call to exporter.ForceFlush, so the exporter is notified and has a chance to take action.

dashpole · 2026-02-25T21:33:38Z

and make a final call to exporter.ForceFlush, so the exporter is notified and has a chance to take action.

I'm not sure... The current spec recommends for the MetricReader to abort if it reaches the timeout today -- even if it hasn't yet gotten to call ForceFlush on the exporter. I suppose there is enough flexibility for the MetricReader to try and be smart and reserve some of the timeout for ForceFlush if it wants (i.e. apply an earlier timeout to Export). But that feels like a change from where the spec is today without batch splitting, and orthogonal to this change.

reyang · 2026-02-26T01:57:13Z

and make a final call to exporter.ForceFlush, so the exporter is notified and has a chance to take action.

I'm not sure... The current spec recommends for the MetricReader to abort if it reaches the timeout today -- even if it hasn't yet gotten to call ForceFlush on the exporter. I suppose there is enough flexibility for the MetricReader to try and be smart and reserve some of the timeout for ForceFlush if it wants (i.e. apply an earlier timeout to Export). But that feels like a change from where the spec is today without batch splitting, and orthogonal to this change.

I disagree.
The current spec says https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#forceflush-1

"ForceFlush SHOULD collect metrics, call Export(batch) and ForceFlush() on the configured Push Metric Exporter."

I think we need to update the semantic by saying "you don't want to call Export() for all the batches if you're reaching the timeout limit, but you SHOULD still call ForceFlush in the end."

dashpole · 2026-02-26T02:33:03Z

Addressed in 382f8e6

ForceFlush MAY skip
Export(batch) calls if the timeout is already expired, but
SHOULD still call ForceFlush() on the configured
Push Metric Exporter even if the timeout has passed.

reyang

LGTM.

specification/metrics/sdk.md

dashpole · 2026-03-11T16:44:10Z

I think approvals represent good language coverage, so i'm planning to merge this tomorrow unless there are objections.

cijothomas · 2026-03-11T16:51:57Z

specification/metrics/sdk.md

+batches. The initial batch of metric data MUST be split into as many "full"
+batches of size `maxExportBatchSize` as possible -- even if this splits up data
+points that belong to the same metric into different batches. The reader MUST
+ensure all metric data points from a single `Collect()` are provided to


is this assuming we won't timeout before finishing all metric datapoints from collect1()?
If we had 1000 points, and 100 is max_batch_size, and we timeout at 8th batch. during next collect2(), we start over, and not continue from 8th batch from previous collect?

The intent is that the timeout is still applied to a single export call, so we would attempt each Export. The problem that could arise is that the SDK generates so many batches that it ends up delaying the next export call. Ideally, language implementations would delay the next Collect, rather than continuously spawning new collections when the previous one hasn't completed.

We could alternatively change the timeout to apply to a collect and export(s) cycle, and adopt behavior similar to what you are suggesting (where once we timeout, we discard all unsent batches).

dashpole force-pushed the sdk_batching branch 2 times, most recently from 04e847b to 581ba77 Compare February 18, 2026 21:57

dashpole changed the title ~~Add development maxExportBatchSize parameter to Periodic exporting MetricReader~~ Add development maxExportBatchSize parameter to Periodic MetricReader Feb 19, 2026

dashpole changed the title ~~Add development maxExportBatchSize parameter to Periodic MetricReader~~ Add development maxExportBatchSize configuration to Periodic MetricReader Feb 19, 2026

dashpole mentioned this pull request Feb 20, 2026

PoC for batching in PeriodicReader open-telemetry/opentelemetry-go#7930

Closed

dashpole marked this pull request as ready for review February 20, 2026 02:51

dashpole requested review from a team as code owners February 20, 2026 02:51

dashpole added the spec:metrics Related to the specification/metrics directory label Feb 20, 2026

reyang reviewed Feb 20, 2026

View reviewed changes

specification/metrics/sdk.md Show resolved Hide resolved

cijothomas reviewed Feb 23, 2026

View reviewed changes

specification/metrics/sdk.md Show resolved Hide resolved

cijothomas reviewed Feb 23, 2026

View reviewed changes

specification/metrics/sdk.md Show resolved Hide resolved

jack-berg reviewed Feb 24, 2026

View reviewed changes

specification/metrics/sdk.md Outdated Show resolved Hide resolved

dashpole added 3 commits February 25, 2026 17:26

Add development maxExportBatchSize parameter to Periodic exporting Me…

6520ebe

…tricReader

update wording of exportIntervalMillis

3e479c2

require full batches

977309e

dashpole force-pushed the sdk_batching branch from 3a255a8 to 977309e Compare February 25, 2026 18:24

clarify that forceflush should export multiple batches

a0403e9

spelling

f09d9a8

allow skipping export calls if the timeout is already reached

382f8e6

reyang approved these changes Feb 26, 2026

View reviewed changes

cijothomas reviewed Feb 26, 2026

View reviewed changes

specification/metrics/sdk.md Show resolved Hide resolved

jack-berg approved these changes Feb 27, 2026

View reviewed changes

dashpole mentioned this pull request Mar 2, 2026

Support Max batch size for Push metrics exporters #4852

Open

pellared approved these changes Mar 9, 2026

View reviewed changes

Merge branch 'main' into sdk_batching

97bdf78

cijothomas reviewed Mar 11, 2026

View reviewed changes

Conversation

dashpole commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Prior Art

Requirements

Non-goals

Proposal

Alternatives considered

maxExportBatchSize for all MetricReaders

maxExportBatchSize on OTLP exporters

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cijothomas commented Feb 24, 2026

Uh oh!

dashpole commented Feb 24, 2026

Uh oh!

dashpole commented Feb 24, 2026

Uh oh!

cijothomas commented Feb 24, 2026

Uh oh!

dashpole commented Feb 24, 2026

Uh oh!

cijothomas commented Feb 24, 2026

Uh oh!

Uh oh!

rajkumar-rangaraj commented Feb 25, 2026

Uh oh!

reyang commented Feb 25, 2026

Uh oh!

dashpole commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

reyang commented Feb 25, 2026

Uh oh!

dashpole commented Feb 25, 2026

Uh oh!

dashpole commented Feb 25, 2026

Uh oh!

dashpole commented Feb 25, 2026

Uh oh!

reyang commented Feb 25, 2026

Uh oh!

dashpole commented Feb 25, 2026

Uh oh!

reyang commented Feb 26, 2026

Uh oh!

dashpole commented Feb 26, 2026

Uh oh!

reyang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dashpole commented Mar 11, 2026

Uh oh!

cijothomas Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

dashpole Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dashpole commented Feb 18, 2026 •

edited

Loading

dashpole commented Feb 25, 2026 •

edited

Loading