[DO NOT MERGE] profiles: more efficient and flexible original_payload handling.#786
[DO NOT MERGE] profiles: more efficient and flexible original_payload handling.#786jhalliday wants to merge 2 commits intoopen-telemetry:mainfrom
Conversation
|
Placing original_payload_format within the Profile message is inefficient in the case where the original format contains multiple streams of data, i.e. converts into multiple OTel Profiles. This is the case when e.g. JFR captures CPU and memory use data concurrently. Likewise it is possible, though less common, that a single Profile combines data from multiple original files, such as where files for consecutive time periods are combined by batching. Since the payload can be large, it is desirable to ensure no unnecessary duplication of data on the wire. Note that there are related issues not addressed here: loss of the original file name, and lack of any way to identify which data streams within a file correspond to which Profiles. These issues are out of scope for this change and more likely to be addressed by semantic conventions than by protocol format changes e.g. defining attribute keys that can be used for these purposes. Potentially breaking changes to the [Alpha] protocol are deferred pending more usage testing and feedback, to avoid interoperability issues at this time. This change is [DO NOT MERGE] for now and will be batched up with any other breaking changes at a future time. |
|
@open-telemetry/profiling-maintainers |
|
A bit unusual to have such a large value in the dictionary but makes sense overall. |
|
Switched to using OriginalDataSource, a new message type that allows each payload content to be accompanied by attributes. These can be used via semconv to e.g. associate a file name with the data. |
| // The format(s) of each file referenced by the corresponding original_payload_indices. | ||
| // Indexes into ProfilesDictionary.string_table. | ||
| repeated int32 original_payload_format_stridx = 12; | ||
| // The original payload bytes. See also original_payload_format_stridx. | ||
| // Optional, but format and the bytes must be set or unset together. | ||
| // i.e. original_payload_format_stridx and original_payload_indices must be arrays of equal length. | ||
| // Indexes into ProfilesDictionary.original_payload_table. | ||
| repeated int32 original_payload_indices = 13; |
There was a problem hiding this comment.
As there is a SemConv attribute dedicated to file formats, file.extension, I'm wondering if we can drop this field here.
| // The format(s) of each file referenced by the corresponding original_payload_indices. | |
| // Indexes into ProfilesDictionary.string_table. | |
| repeated int32 original_payload_format_stridx = 12; | |
| // The original payload bytes. See also original_payload_format_stridx. | |
| // Optional, but format and the bytes must be set or unset together. | |
| // i.e. original_payload_format_stridx and original_payload_indices must be arrays of equal length. | |
| // Indexes into ProfilesDictionary.original_payload_table. | |
| repeated int32 original_payload_indices = 13; | |
| // The original payload bytes. | |
| // Indexes into ProfilesDictionary.original_payload_table. | |
| repeated int32 original_payload_indices = 12; |
| // The file or buffer content | ||
| bytes content = 1; | ||
| // References to attributes in ProfilesDictionary.attribute_table. [optional] | ||
| repeated int32 attribute_indices = 2; |
There was a problem hiding this comment.
Should we make file.extension and/or file.name a required attribute?
Move original_payload data to a dictionary table to avoid duplication when a single payload encodes to multiple Profiles.