Move process executable to its own entity#3536
Move process executable to its own entity#3536osullivandonal wants to merge 10 commits intoopen-telemetry:mainfrom
Conversation
This is being done as per feedback from updating the process attributes requirement levels work.The advantage to moving process.executable to its own entity to to allow a relationship between process and process.executable, for example you could look at all processes beloning to a single executable.
Added build_id.go and build_id.gnu, build_id.htlhash
01a1e0e to
f17cec7
Compare
ChrsMark
left a comment
There was a problem hiding this comment.
LGTM
Let's file an issue for #3536 (comment) so we don't miss that.
New issue created #3563 |
| @@ -1,2 +1,2 @@ | |||
| groups: | |||
| - id: entity.process | |||
There was a problem hiding this comment.
maybe not directly in scope of this pr, but should we call this entity process instance instead? This is similar to service instance entity and describes what it is more precisely.
There was a problem hiding this comment.
My hesitation with that is that we won't change the namespace of any of the metrics or attributes to process.instance.* so having the entity just be called process feels like it maps well to how the other names are all shaped.
braydonk
left a comment
There was a problem hiding this comment.
I usually don't press the "Request Changes" button cause it feels aggressive lol, just doing it here because my comment is intended to block merge.
| A single executable may be run my multiple process instances. | ||
| attributes: | ||
| - ref: process.executable.path | ||
| role: identifying |
There was a problem hiding this comment.
I don't think the executable path is identifying. While a process is running the path is able to change on all platforms our conventions actively target. On Linux it has no effect on the process's view of what binary it's running; it keeps a handle to the file on process startup and if the executable gets moved, the process's handle to the file is the same. So if this attribute is intended to be collected via /proc/pid/exe, this totally could change during the lifetime of the process even though it doesn't change the binary the process is running.
Maybe the profiling group has a different opinion here, so perhaps @christos68k could chime in here. Is it important that an executable actually be identified by its path when the process first starts for profiling usecases?
There was a problem hiding this comment.
To go along with this, if this is in fact not to be made an identifying attribute, then we need to actually make the htlhash the identifying attribute.
There was a problem hiding this comment.
Just catching up now as I have been on PTO the last week. This makes sense to me @braydonk, making the process.executable.build_id.htlhash the identifying attribute and we could make the process.executable.path a descriptive attribute?
There was a problem hiding this comment.
So for me it comes down to what is the entity intending to identify:
- Is it the logical process executable being described? If so makes sense to make htl-hash identifying as it will be consistent across all instances of the executable regardless of where it is running. In which case I don't think path can be descriptive as if you have multiple copies of the executable at different paths then you don't have reproducable id
- is it for identifying a runtime instance of the executable in which case we need path or some derivative from the location as identifying so we can differentiate which instance of the process executable is being run.
My thoughts would be:
- make htlhash identifying
process.executable - Remove
process.executable.pathfrom all entities and ensure it remains experimental.
In future pr:
- define method of generating
app.installation.idfor non mobile app installation ie use directory id. - rename
process.executable.pathtoapp.installation.pathand add it to anapp.installationentity. This way you can capture usage of each copy (installation) of the executable.
Alternatively options would be to:
- use
deployment.instancenamespace instead ofappbut it doesn't fit right to me. - introduce another
process.*entity for describing an installation of the executable.
I like the app option as it is clear that it describing an installation but it needs more discussion hence in this PR I would focus on the first 2 steps.
There was a problem hiding this comment.
is it for identifying a runtime instance of the executable in which case we need path or some derivative from the location as identifying so we can differentiate which instance of the process executable is being run.
I think I understand the general goal of that, but the executable path still can't be used to identify this. The path of the executable can change during the process lifetime, and it doesn't change what binary the process is running.
Remove process.executable.path from all entities and ensure it remains experimental.
I'm not advocating for the attribute to be removed, just made descriptive instead of identifying. I don't think it's a harm to keep the attribute around and still potentially do the other stuff you mentioned in a followup.
There was a problem hiding this comment.
In which case I don't think path can be descriptive as if you have multiple copies of the executable at different paths then you don't have reproducable id
I'm not understanding this. I think this is fundamentally true, right? Like it's not a matter of modeling as telemetry, this is just fundamentally the case with binaries. The path of the binary being run is by nature unstable, so can never be relied on to create a reproducible ID.
There was a problem hiding this comment.
I'm not advocating for the attribute to be removed, just made descriptive instead of identifying.
I think this is fundamentally true, right? Like it's not a matter of modeling as telemetry, this is just fundamentally the case with binaries. The path of the binary being run is by nature unstable, so can never be relied on to create a reproducible ID.
The problem I foresee with it being descriptive is what do you expect the value of path to be on the process.executable entity if you have 2 copies of the same executable running on the same machine.
Key thing is should the entity be representing an instance of the executable or is it just the executable which can exist in many locations.
There was a problem hiding this comment.
I updated this, now:
process.executable.build_id.htlhashis the identifying attributeprocess.executable.pathhas been moved to a descriptive attribute
cc. @braydonk
There was a problem hiding this comment.
Key thing is should the entity be representing an instance of the executable or is it just the executable which can exist in many locations.
I believe we are modeling it exactly how it works in the operating system here, so I'm not sure I'm understanding the concerns.
Consider the scenario where PID 1 and PID 2 both are running the executable at /usr/bin/program. There will be a different process entity for each process, but the process.executable entity attached to the telemetry for each will be the same. This is the same as what's actually going on in the system. They are two different processes, but they are running the same executable and there's no actual distinction to be made. The distinct information is about the actual process object, not about the executable file.
There was a problem hiding this comment.
I have no issue with your use case:
Consider the scenario where
PID 1andPID 2both are running the executable at/usr/bin/program.
My concern is the following scenario:
PID 1andPID 2both are running the same executable butPID 1is a copy of the executable at/usr/bin/program&PID 2is a copy of the executable at/usr/bin/program2.
If they both use the same process.executable entity what are we expecting the value of the path attribute to be on the entity.
|
@open-telemetry/specs-semconv-maintainers this is ready to be merged from the System SemConv SIG side :) |
it looks like there are still a couple of open conversations |
|
@braydonk could you review the conversations and close them if it makes sense? |
Fixes #3535
Changes
Move process.executable to its own entity
This is being done as per feedback from updating the process attributes
requirement levels work, see PR: #3461.
The advantage to moving process.executable to
its own entity to to allow a relationship between process and
process.executable, for example you could look at all processes belonging
to a single executable.
The new identifying attributes for the new entity
process.executableare:process.executable.nameprocess.executable.pathImportant
Pull requests acceptance are subject to the triage process as described in Issue and PR Triage Management.
PRs that do not follow the guidance above, may be automatically rejected and closed.
Merge requirement checklist
[chore]