-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: add blog post for --enable-feature=use-uncached-io #2869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
57cc029
d8b9ab2
81dcb92
165edc8
47ee069
e46b73e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,53 @@ | ||||||
| --- | ||||||
| title: "Uncached I/O in Prometheus" | ||||||
| created_at: 2026-03-05 | ||||||
| kind: article | ||||||
| author_name: "Ayoub Mrini (@machine424)" | ||||||
| --- | ||||||
|
|
||||||
| <!-- more --> | ||||||
|
|
||||||
| Do you find yourself constantly looking up the difference between `container_memory_usage_bytes`, `container_memory_working_set_bytes`, and `container_memory_rss`? It gets worse when you pick the wrong one to set a memory limit, interpret benchmark results, or debug an OOMKilled container. | ||||||
|
||||||
| Do you find yourself constantly looking up the difference between `container_memory_usage_bytes`, `container_memory_working_set_bytes`, and `container_memory_rss`? It gets worse when you pick the wrong one to set a memory limit, interpret benchmark results, or debug an OOMKilled container. | |
| Do you find yourself constantly looking up the difference between `container_memory_usage_bytes`, `container_memory_working_set_bytes`, and `container_memory_rss`? Do you know which one to use for memory limits, benchmark result intepretation, OOMKilled debugging? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworded
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| You're not alone. There is even a [9-year-old Kubernetes issue](https://github.com/kubernetes/kubernetes/issues/43916) that captures the frustration of many others. | |
| You're not alone. There is even a [9-year-old Kubernetes issue](https://github.com/kubernetes/kubernetes/issues/43916) that captures the frustration of users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we narrow down OS? I'd add this blog post applies to Linux only (both AMD and ARM)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a "NOTE"
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mention clearly that this page cache is meant for best-effort data - the moment kernel needs memory for other processes it should be able to clean this cache. But for large box with unused memory, memory can be marked as "used" to the limit of that box which can be scary and confusing - despite this memory can be cleaned on demand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a mention and a link
machine424 marked this conversation as resolved.
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny nit, but for the future: this flag probably could be called "uncached-io" - we don't call our flags "enable-"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's use-uncached-io, but I agree, we can always do better with naming :)
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section calls page cache without actually defining what it is? Is it worth to educate reader what page cache is? (or at least link to wikipedia etc?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a link. As you know, it’s a balance between clarity and conciseness; if I start explaining “feature flag” “compaction” or “disk writes"...
Even though the post isn't highly technical, it’s intended for readers who are already somewhat familiar with the concepts/limitations mentioned.
I also expect that many people will read this through an LLM, which can supply any missing references or additional details...
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compaction writes are a good example, because once written, that data is unlikely to be read again soon.
Can we explain why unlikely? This data is used for long term storage queries. It's worth to mention that in practice, majoritiy of queries hit only 24h or even 1h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworded
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first question is .. does it have impact on other metrics? Performance of other stuff? Is it useful to mention in this blog post?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the benchmarks I ran, there are no notable performance improvements to report. Of course, I would have mentioned any regressions if I had encountered them.
Maybe users running large, long-lived instances will share whether they saw any improvements.
machine424 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽
Uh oh!
There was an error while loading. Please reload this page.