pillar/types: add NumKmsgDropped metric to NewlogMetrics#5625
Draft
rucoder wants to merge 1 commit intolf-edge:masterfrom
Draft
pillar/types: add NumKmsgDropped metric to NewlogMetrics#5625rucoder wants to merge 1 commit intolf-edge:masterfrom
rucoder wants to merge 1 commit intolf-edge:masterfrom
Conversation
Add NumKmsgDropped field to NewlogMetrics to track kernel messages lost due to kernel ring buffer overflow. This makes kernel log loss observable via the controller. Under heavy system load, newlogd can fall behind reading /dev/kmsg, causing the kernel ring buffer (128KB by default) to overflow and silently drop messages. Currently there is no metric to detect this. The new field will be populated by newlogd using /dev/kmsg sequence number gap detection. Signed-off-by: Mikhail Malyshev <mike.malyshev@gmail.com>
0b8e109 to
06cbc44
Compare
7 tasks
europaul
approved these changes
Feb 24, 2026
Contributor
Author
|
we decided to update eve-api as well. moving to draft for now |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5625 +/- ##
==========================================
+ Coverage 19.52% 29.49% +9.96%
==========================================
Files 19 18 -1
Lines 3021 2417 -604
==========================================
+ Hits 590 713 +123
+ Misses 2310 1552 -758
- Partials 121 152 +31 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add
NumKmsgDroppedfield toNewlogMetricsto track kernel messages lost due to kernel ring buffer overflow. This makes kernel log loss observable via the controller in the future.Under heavy system load,
newlogdcan fall behind reading/dev/kmsg, causing the kernel ring buffer (128KB by default,CONFIG_LOG_BUF_SHIFT=17) to overflow and silently drop messages — typically the earliest messagesthat contain the root cause of the problem being debugged.
Currently there is no metric to detect this loss. The new field will be populated by
newlogdusing/dev/kmsgsequence number gap detection (in a follow-up PR topkg/newlog).PR dependencies
None. This is the first PR in a two-PR sequence:
pkg/newlog— implements the kernel log pipeline improvements and populates the metric (depends on this PR being merged and vendored)How to test and validate this PR
This PR only adds a new field to a struct. It has no behavioral change
on its own. Validation:
cd pkg/pillar && go build ./types/— passespkg/newlogPRChangelog notes
Added
NumKmsgDroppedmetric to track kernel message loss due to ring buffer overflow. This metric will be populated by newlogd once the companion newlog changes land.PR Backports
Checklist
I've provided a proper description
I've added the proper documentation
I've tested my PR on amd64 device
I've tested my PR on arm64 device
I've written the test verification instructions
I've set the proper labels to this PR
I've checked the boxes above, or I've provided a good reason why I didn't
check them.