-
Notifications
You must be signed in to change notification settings - Fork 299
Add Pressure Stall Information (PSI) metrics (reopened #2996) #3068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
# Conflicts: # docs/system/system-metrics.md
Co-authored-by: James Thompson <[email protected]>
Co-authored-by: James Thompson <[email protected]>
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
|
@thompson-tomo @braydonk @trask |
|
@alpineQ can you rebase/merge in master as the doc templates have been updated. |
|
@thompson-tomo any updates on this? |
thompson-tomo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs and definitions look good to me based on published guidance & clarification.
|
hi @alpineQ, this will need review and approval from @open-telemetry/semconv-system-approvers |
|
@trask do these @open-telemetry/semconv-system-approvers really exist or only you can see them? 🤣 |
|
This PR has been labeled as stale due to lack of activity. It will be automatically closed if there is no further activity over the next 7 days. |
|
@alpineQ Apologies for the delayed response. The group has been focused on delivering the first stable release of a subset of system metrics, and unfortunately this PR slipped through the cracks. I’ve also noticed that we’re attempting to add a memory pressure metric for Darwin as well (open-telemetry/opentelemetry-collector-contrib#45154). This made me wonder whether we could agree on a cross-platform, generic naming scheme for pressure metrics (for example, system.cpu.pressure). Since I’m not very familiar with how this concept is handled across other platforms, I’ve added this topic to the agenda for our next SIG meeting (08/01/2026) so we can discuss it together. |
|
@alpineQ in light of open-telemetry/opentelemetry-collector-contrib#45154 it appears memory pressure is also applicable to macos. Should we split based on resource type which would mean we end up with:
Io would become disk, network or other depending on what it refers to. This way these metrics are complementing
We then describe it in the description that it comes from psi. |
Closes #2995
Changes
This PR adds support for Linux Pressure Stall Information (PSI) metrics to the system semantic conventions.
PSI is a Linux kernel feature (available since kernel 4.20) that identifies and quantifies resource contention by measuring the time impact that CPU, memory, and I/O resource crunches have on workloads.
New Metrics
system.linux.psi.pressure(Gauge): Measures resource pressure as a percentage of time that tasks were stalled over a time window (10s, 60s, or 300s)system.linux.psi.total_time(Counter): Tracks the total cumulative stall time in microseconds since system bootNew Attributes
system.psi.resource: The resource type (cpu,memory,io)system.psi.stall_type: The stall severity (somefor partial stalls,fullfor complete stalls where all non-idle tasks are blocked)system.psi.window: The time window for pressure calculation (10s,60s,300s)Use Cases
PSI metrics enable:
References
Relevant issues and PRs
There are issues on this matter in:
And 2 PRs that I am proposing to address these issues:
Important
Pull requests acceptance are subject to the triage process as described in Issue and PR Triage Management.
PRs that do not follow the guidance above, may be automatically rejected and closed.
Merge requirement checklist
[chore]Reopened #2996