Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiles: Replace has_* fields with an enum. #595

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 55 additions & 5 deletions opentelemetry/proto/profiles/v1development/profiles.proto
Original file line number Diff line number Diff line change
Expand Up @@ -423,6 +423,54 @@ message Label {
int64 num_unit = 4; // Index into string table
}

// Specifies the availability of the function and file names, line numbers and
// inline frames for a mapping.
enum SymbolizationLevel {
Copy link
Member

@christos68k christos68k Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aalexand Why not have this as an attribute? (I admit I already forgot the relevant part in today's discussion). Looked up the mutability point in the agenda, and just to clarify: wouldn't this be either set once, at origin or if not set at origin, at a later point (e.g. in a processor)? Trying to understand why an attribute wouldn't be a fit here.

If it's just a matter of having to remove a previously set attribute, wouldn't this only apply to SYMBOLIZATION_LEVEL_UNSPECIFIED (which we can define as the absence of the attribute) or would we need more elaborate processing that takes more cases into account?

One reason that springs to mind is implementors not having to deal with OTel KeyValue which IIRC is a concern that you had in the past. Just wondering if there is something else, feel fee to correct my assumptions 😅

Copy link
Member Author

@aalexand aalexand Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most elaborate "life of a profile" in terms of symbolization that I can think of is something like:

  1. A collector gathers a C++ profile. It sets the symbolization level to SYMBOLIZATION_LEVEL_NONE to indicate that only IP addresses are available and nothing else.
  2. The profile is transferred to the backend and stored there.
  3. The offline symbolizer processes the profile. At this point, initially, only stripped binary is available to the symbolizer so it symbolizes the profile at SYMBOLIZATION_LEVEL_SYMBOLS level.
  4. The offine symbolizer at some point gets full DWARF for the binary and re-symbolizes the profile, storing it with SYMBOLIZATION_LEVEL_LINES_INLINE level now.

The re-symbolization part is not something that all profilers may want to support, but I wanted to indicate that the level upgrade is possible.

Of course, replacing an attribute in a proto in memory is not a huge deal - iterate over the repeated opentelemetry.proto.common.v1.KeyValue attributes list and replace or append the attribute with the desired key and new value, but it's still microchurn and for something that is so common as symbolization level it feels that having a simple explicit field is nicer. But it's definitely not a dealbreaker, so I'd love to hear what others think.

As a side note, I wonder if it would be better if repeated opentelemetry.proto.common.v1.KeyValue attributes would be a map<Key, Value> instead. I thought maybe it's to support multi-value attributes, but there is an explicit support for lists in this type, so it's not that.

One other thing with commony used attributes also is that the definition of allowed values in their semantics is so far away from the proto itself. With an enum one gets compile-time checks, code completion etc. With attributes I don't even know what kind of presubmit or code generation support exists. This aspect probably shouldn't solely guide our decisions on what should be an attribute vs a field, but I think it's still a factor.

// Unknown or unassigned. Clients can try to determine the actual level
// heuristically from the presence of function and file names, line numbers
// and inline frames.
SYMBOLIZATION_LEVEL_UNSPECIFIED = 0;
// No symbolization was attempted for the mapping and it's known to be a
// native mapping with no pre-populated symbol information. Function and file
// names, line numbers and inline frames are absent for all locations for the
// mapping.
//
// This level is common for a native (e.g. C++) mapping in a profile emitted
// by a production profiling collector since no symbolization is typically
// attempted on the host as debug information is usually not shipped to
// production machines.
//
// The level is rarely used with managed language mappings like Java since
// symbolization for those languages is typically done on the host.
SYMBOLIZATION_LEVEL_NONE = 1;
// Limited symbol information is available: function names are assigned but
// may be imprecise; file names, line numbers and inline frames are missing.
//
// This level is encountered when the symbolization is performed for a C++
// binary that has symbol table (.symtab) present, but no DWARF. Such a symbol
// table records top-level symbol names, but it won't have the more granular
// function and line breakdown.
//
// The level is rarely used with managed languages like Java since their
// symbolization information is typically more complete.
SYMBOLIZATION_LEVEL_SYMBOLS = 2;
// Limited debug information is available: function / file names and line
// numbers are assigned but the inline frames are not available.
//
// This is a somewhat exotic case specific to C++ binaries with split DWARF
// information. When symbolization is done against such a binary and the *.dwp
// file is not available, the DWARF is available only partially which results
// in this more complete but still partial symbolization level.
//
// This level is never practically useful for managed languages.
SYMBOLIZATION_LEVEL_LINES_NOINLINE = 3;
// Full, most-desired level of symbolization. All of function and file names,
// line numbers and inline frames are available. This level indicates that
// full debug information was available for the binary. It is also what
// managed languages like Java provide.
SYMBOLIZATION_LEVEL_LINES_INLINE = 4;
}

// Describes the mapping of a binary in memory, including its address range,
// file offset, and metadata like build ID
message Mapping {
Expand All @@ -440,11 +488,13 @@ message Mapping {
int64 filename = 5; // Index into string table
// References to attributes in Profile.attribute_table. [optional]
repeated uint64 attributes = 12;
// The following fields indicate the resolution of symbolic info.
bool has_functions = 7;
bool has_filenames = 8;
bool has_line_numbers = 9;
bool has_inline_frames = 10;
bool has_functions = 7; // Deprecated, to be removed.
bool has_filenames = 8; // Deprecated, to be removed.
bool has_line_numbers = 9; // Deprecated, to be removed.
bool has_inline_frames = 10; // Deprecated, to be removed.
// The level of availability of the function and file names, line numbers and
// inline frames for the mapping. [optional]
SymbolizationLevel symbolization_level = 13;
}

// Describes function and line table debug information.
Expand Down
Loading