Skip to content

Conversation

@ldufr
Copy link

@ldufr ldufr commented Jan 6, 2026

Description

This change add support for a code path to pdata generator that lazily deserialize the metrics field in ScopedMetrics. By default the existing behavior isn't changed, but a caller could decide the lazily deserialize.

This approach gives few advantages:

  • We can cap the peak memory usage which allows the processing of larger messages without running OOM.
  • We enable the re-use of the memory used for deserializing Metric which plays particularly well with the experimental feature pdata.useProtoPooling.

Additional note:

  1. This is only implemented for protobuf messages. It could be done for json, but additional support would be needed. I suspect the benefits would be less interesting, because skipping a field isn't as straightforward.
  2. I split the commits carefully to illustrate two cases. The first one is to simply keep the bytes around for when we want to deserialize it. It's faster, but push some errors to the point when we deserialize the message. That is, invalid protobuf message could still be unmarshalled and only when we deserialize the Metric would the errors happen. This is add difficulty to keep the API compatible, because calling .At(), or .All() could fail. Furthermore, the official protobuf FieldDescriptor support a "lazy" annotation, see here, and the comment says that the message should be eagerly verified.
  3. We probably want to only generate lazy definition when necessary, that is, if the type is used at least once as a lazy type. This is still to be done.
  4. To keep compatibility with the deltatocumulative, At() needs to deserialize in it's own state, because the internal data may be modified. This in turn forces to creation of a different function.

Link to tracking issue

Fixes #14246 (check for additional information)

Testing

This was tested and benchmarked in a Mimir deployment as well as a benchmark.

Documentation

TODO

@ldufr ldufr requested review from a team, bogdandrutu, dmathieu, dmitryax and mx-psi as code owners January 6, 2026 12:53
Comment on lines +73 to +82
func (es {{ .structName }}) Unmarshal(i int, m *{{ .elementName }}) error {
var buf internal.{{ .elementOriginName }}
res, err := (*es.{{ .origAccessor }})[i].FinishUnmarshal(&buf.{{ .elementName }})
if err != nil {
return err
}
*m = new{{ .elementName }}(res, es.{{ .stateAccessor }})
return nil
}
{{- end }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not publicly expose this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is definitely too much details, but I'm not sure we can easily get rid of the functionality it does.

I considered few other name, like ReadInto, Get, but didn't find anything great. I wonder if you have any better name in mind?

I don't think it would ever be possible for this function to error, because we eagerly check the wire format, but hard to be 100%. Also, regardless whether we error or not, I don't think we can replace the At implementation, because some code calls At(), then modify the data, expecting it will modify the original document. This is why, the At deserialize the data in the document, which won't break existing code, but makes the lazy deserializing useless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[pdata] Support lazily deserializing OTLP messages

2 participants