-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-5032: Container log rotation on Disk pressure #5022
base: master
Are you sure you want to change the base?
Conversation
Zeel-Patel
commented
Jan 7, 2025
•
edited
Loading
edited
- One-line PR description: Rotate containers logs when there is disk pressure on kubelet host.
- Issue link: Container log rotation on Disk pressure #5032
Welcome @Zeel-Patel! |
Hi @Zeel-Patel. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Zeel-Patel The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/sig-node |
|
||
## Design Details | ||
|
||
Define 2 new flags `logRotateDiskCheckInterval`, `logRotateDiskPressureThreshold` in kubelet config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that when there is disk pressure in nodefs.available/imagefs.available we can add rotateLogs to our list of functions we run in case of disk pressure.
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/eviction/helpers.go#L1195-#L1230
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's better. Thanks.
But won't that use kubelet eviction thresholds to identify disk pressure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this log cleanup is another form of eviction to make space like we do with Images and containers.
So, Adding that to the nodefs.available
/ imagefs.available
would logically make a great addition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
big +1, this was my initial thinking as well. Detecting disk pressure should be a trigger for log rotation in addition to the timer-based approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
Considering this
#5022 (comment)
If we rely on same kubelet eviction thresholds, it will cause log rotation to fail. We have experienced it as well, that log copy creation fails.
Up for suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ffromani @harshanarayana @kannon92
Listing down few options, lmk your thoughts
- We use same disk eviction thresholds. Every time there is disk pressure, check size of .gz log files of all containers. If the combined file size (for a particular container) exceeds (containerLogMaxSize*containerLogMaxFiles), simply delete older .gz files till remaining size is within this limit. No log rotate. So if we have containerLogMaxSize = 20Mib and containerLogMaxFiles = 6, we will ensure that combined log file size remains within 120Mib. It could also happen that a single .gz file endsup taking all 120Mib after cleanup
- Blindly do log rotation as proposed in KEP (But using disk pressure thresholds as suggested by you all)
- Make the change suggested in option 1 in ContainerLogManager periodic log rotation itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my suggestion is when disk pressure is hit (by our eviction settings) we would trigger your logic for forcing rotations on any logs that exceed the limits.
Though the fact that they exceed limits seems to be a bug..
The goal would be to preemptly try and clean up disk space without evicting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this goal, this seems good to me
We use same disk eviction thresholds. Every time there is disk pressure, check size of .gz log files of all containers. If the combined file size (for a particular container) exceeds (containerLogMaxSize*containerLogMaxFiles), simply delete older .gz files till remaining size is within this limit. No log rotate. So if we have containerLogMaxSize = 20Mib and containerLogMaxFiles = 6, we will ensure that combined log file size remains within 120Mib. It could also happen that a single .gz file endsup taking all 120Mib after cleanup
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving behind a few comments/ Feel free to ignore if you consider them irrelevant/invalid.
|
||
Define 2 new flags `logRotateDiskCheckInterval`, `logRotateDiskPressureThreshold` in kubelet config. | ||
|
||
- `logRotateDiskCheckInterval` is the time interval within which the ContainerLogManager will check Disk usage on the kubelet host. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my two cent.
I was the one who added a similar config to the Log rotate workflow where it can take a configurable number workers with a duration and rotate logs in async mode to help reduce the leak of log file size when the container log generation rate it way too high.
Which definitely helps, but never really prevents the issue. The safest way to deal with this would be to truncate and fix the size when the logs are being written instead of monitor, rotate and cleanup. That will be the most fool proof way to do this.
That being said, In absence of that workflow, I like your suggestion.
it will rotate logs of all the containers of the kubelet.
However, I am not sure if all service/containers should be taxed because of one service's misbehaviour. The reason being how the kubectl log
would look like after one such global rotation.
As far as I can recall, the .gz
extensions are ignored when the kubectl logs
are being used. So, forcing a global rotation can lead to logs going missing from the kubectl log
out of nowhere.
We should still cleanup only those logs that have exceeded the configured threshold in this loop so that we let the other service behave the way they do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what we are proposing.
Currently, log files are anyways exceeding the set thresholds which causes high disk usage.
Instead of evicting pods, we can once try rotating logs of such containers (containers for which logs size is exceeding the containerLogMaxSize threshold)
|
||
## Design Details | ||
|
||
Define 2 new flags `logRotateDiskCheckInterval`, `logRotateDiskPressureThreshold` in kubelet config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this log cleanup is another form of eviction to make space like we do with Images and containers.
So, Adding that to the nodefs.available
/ imagefs.available
would logically make a great addition
``` | ||
containerLogMaxSize = 200M | ||
containerLogMaxFiles = 6 | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, there is also two other fields to configure the worker + the monitoring duration to perform the cleanup.
containerLogMaxWorkers
and containerLogMonitorInterval
respectively.
|
||
|
||
If the pod had been generating logs in Gigabytes with minimal delay, it can cause disk pressure on kubelet host and that can affect other pods running in the same kubelets. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also prevent further container log rotation to fail and make the problem even worse.
case and point, an issue we ran into very recently.
Dec 11 21:35:31 tardis-node-192-168-4-51 kubelet[8204]: E1211 21:35:31.794811 8204 container_log_manager.go:296] "Failed to rotate log for container" err="failed to compress log "/var/log/pods/tardis-pod-2-5bb1ae31-tm-78475757b8-2mbl9_f53b5f87-5b48-4748-a919-9eb702bb5f0c/taskmanager/2.log.20241211-202335": failed to create temporary log "/var/log/pods/tardis-pod-2-5bb1ae31-tm-78475757b8-2mbl9_f53b5f87-5b48-4748-a919-9eb702bb5f0c/taskmanager/2.log.20241211-202335.tmp": open /var/log/pods/tardis-pod-2-5bb1ae31-tm-78475757b8-2mbl9_f53b5f87-5b48-4748-a919-9eb702bb5f0c/taskmanager/2.log.20241211-202335.tmp: disk quota exceeded" path="/var/log/pods/tardis-pod-2-5bb1ae31-tm-78475757b8-2mbl9_f53b5f87-5b48-4748-a919-9eb702bb5f0c/taskmanager/2.log" containerID="2c7de6fdcbda4498543ab3cc68d59dcee487e08ad7d37751f9ea5c366975f784"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have faced this issue on disk pressure as well. But the periodic log rotations is anyways failing.
If we wisely chose logRotateDiskPressureThreshold, it would be helpful.
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should definitely help. But the way to chose the right size and frequency to monitor is trial and error and I can't think of an effective way to determine the number. But that is besides the scope of this KEP anyway.
|
||
### Goals | ||
|
||
- Rotate and Clean all container logs on kubelet Disk pressure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be a configuration required to help identify what logs are to be selected for such cleanup ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Containers for which log size is exceeding containerLogMaxSize will be selected for cleanup.
editor: "@Zeel-Patel" | ||
creation-date: 2025-01-08 | ||
last-updated: 2025-01-08 | ||
reviewers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<!-- | ||
**Note:** When your KEP is complete, all of these comment blocks should be removed. | ||
|
||
To get started with this template: | ||
|
||
- [ ] **Pick a hosting SIG.** | ||
Make sure that the problem space is something the SIG is interested in taking | ||
up. KEPs should not be checked in without a sponsoring SIG. | ||
- [ ] **Create an issue in kubernetes/enhancements** | ||
When filing an enhancement tracking issue, please make sure to complete all | ||
fields in that template. One of the fields asks for a link to the KEP. You | ||
can leave that blank until this KEP is filed, and then go back to the | ||
enhancement and add the link. | ||
- [ ] **Make a copy of this template directory.** | ||
Copy this template into the owning SIG's directory and name it | ||
`NNNN-short-descriptive-title`, where `NNNN` is the issue number (with no | ||
leading-zero padding) assigned to your enhancement above. | ||
- [ ] **Fill out as much of the kep.yaml file as you can.** | ||
At minimum, you should fill in the "Title", "Authors", "Owning-sig", | ||
"Status", and date-related fields. | ||
- [ ] **Fill out this file as best you can.** | ||
At minimum, you should fill in the "Summary" and "Motivation" sections. | ||
These should be easy if you've preflighted the idea of the KEP with the | ||
appropriate SIG(s). | ||
- [ ] **Create a PR for this KEP.** | ||
Assign it to people in the SIG who are sponsoring this process. | ||
- [ ] **Merge early and iterate.** | ||
Avoid getting hung up on specific details and instead aim to get the goals of | ||
the KEP clarified and merged quickly. The best way to do this is to just | ||
start with the high-level sections and fill out details incrementally in | ||
subsequent PRs. | ||
|
||
Just because a KEP is merged does not mean it is complete or approved. Any KEP | ||
marked as `provisional` is a working document and subject to change. You can | ||
denote sections that are under active debate as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel free to remove comments when no longer needed, to make review (and reading in general) easier
|
||
Rotate containers logs when there is disk pressure on kubelet host. | ||
|
||
## Motivation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will need something here
<!-- | ||
This is where we get down to the specifics of what the proposal actually is. | ||
This should have enough detail that reviewers can understand exactly what | ||
you're proposing, but should not include things like API designs or | ||
implementation. What is the desired outcome and how do we measure success?. | ||
The "Design Details" section below is for the real | ||
nitty-gritty. | ||
--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will need some details here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
brainstorming, let's evaluate some ideas:
- if the log rotation manager detects that the previous X rotations (1? 2? N?) made the rotated logs exceeds the configured max size, because containers are producing many logs continuously, should this cause kubelet to report disk pressure?
- do we want or need to review log retention also? if a rotated log is like 500 megs while it is supposed to be 100 megs (random example numbers), should that cause the kubelet to report disk pressure? arguably, we are consuming more disk space than the user configured, and thus expects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if these should trigger a disk pressure report. That can have a side-effect. Right ?
However, this can definitely be Turned into a warning event from the node level to indicate that something is going wrong. Would that lead to an event storm for some cases? Depends on how frequently we generate the event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the log rotation manager detects that the previous X rotations (1? 2? N?) made the rotated logs exceeds the configured max size
This problem becomes even more concerning with high value being set for containerLogMaxFiles
.
|
||
## Design Details | ||
|
||
Define 2 new flags `logRotateDiskCheckInterval`, `logRotateDiskPressureThreshold` in kubelet config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
big +1, this was my initial thinking as well. Detecting disk pressure should be a trigger for log rotation in addition to the timer-based approach.
editor: "@Zeel-Patel" | ||
creation-date: 2025-01-08 | ||
last-updated: 2025-01-08 | ||
reviewers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also add @harshanarayana
|
||
### Risks and Mitigations | ||
|
||
No identified risk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's think a bit more about this, especially if we start from beta. What can go wrong?
- Will enabling / disabling the feature require downtime or reprovisioning | ||
of a node? Yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will it? why exactly? could you please elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Suppose this is a function of what your cloud provider lets you do in terms of tuning the kubelet configuration.
- For on prem deployments, this definitely is just a kubelet restart away. So, not much of a downtime/re-provisioning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cloud provider use case is interesting indeed, but from project perspective this looks like the BM use case anyway. I think a good first step is just adding more details on the answer here.
@Zeel-Patel Please create an issue in kubernetes/enhancement that tracks this KEP. That would be the title of the KEP. The k/k issue is used as a discussion to create the KEP. Sorry to be pain about this. |
|
||
### Goals | ||
|
||
- Rotate and Clean all container logs on kubelet Disk pressure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Rotate and Clean all container logs on kubelet Disk pressure | |
- Rotate and Clean all container logs on kubelet Disk pressure that has exceeded the configured log retention quota |
|
||
### What would you like to be added? | ||
|
||
The [ContainerLogManager](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/logs/container_log_manager.go#L52-L60), responsible for log rotation and cleanup of log files of containers periodically, should also rotate logs of all containers in case of disk pressure on host. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The [ContainerLogManager](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/logs/container_log_manager.go#L52-L60), responsible for log rotation and cleanup of log files of containers periodically, should also rotate logs of all containers in case of disk pressure on host. | |
The [ContainerLogManager](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/logs/container_log_manager.go#L52-L60), responsible for log rotation and cleanup of log files of containers periodically, should also rotate logs of all containers that has exceeded the configured log retention quota in case of disk pressure on host. |
## Proposal | ||
|
||
<!-- | ||
This is where we get down to the specifics of what the proposal actually is. | ||
This should have enough detail that reviewers can understand exactly what | ||
you're proposing, but should not include things like API designs or | ||
implementation. What is the desired outcome and how do we measure success?. | ||
The "Design Details" section below is for the real | ||
nitty-gritty. | ||
--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the details from the KEP should go into this section.
No | ||
|
||
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? | ||
ContainerLogManager of kubelet will use more CPU cycle then now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ContainerLogManager of kubelet will use more CPU cycle then now. | |
CPU cycles usage of ContainerLogManager of kubelet will increase. |
|
||
## Motivation | ||
|
||
- A lot of out kubelet hosts experienced Disk pressure as a certain set of pods was generating very high logs. The rate was around 3-4Gib in 15 minutes. We had containerLogMaxSize set to 200Mib and containerLogMaxFiles set to 6. But the .gz files were of size around 500-600Gib. We observed that container log rotation was slow for us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it seems to be a bug that logs exceed the limit we specify. I know @harshanarayana worked on the parallel workers to clean up logs.
Reading this it seems that we proposing a feature to resolve a bug that log rotation exceeds the limit.
Is it worth focusing on fixing this issue?
ie what I am trying to get is if we "fix" this bug of log rotations exceeding limits is there a need for this KEP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this write up, who is "our"? Kubernetes project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry that this won't work out that well when we come close to disk pressure. Let's say that a few pods are exceeding their limits and then we trigger a function to rotate. If we are close to host file system limits would log rotation still be expected to work?
/retitle KEP-5032: Container log rotation on Disk pressure |
|
||
## Motivation | ||
|
||
- A lot of out kubelet hosts experienced Disk pressure as a certain set of pods was generating very high logs. The rate was around 3-4Gib in 15 minutes. We had containerLogMaxSize set to 200Mib and containerLogMaxFiles set to 6. But the .gz files were of size around 500-600Gib. We observed that container log rotation was slow for us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this write up, who is "our"? Kubernetes project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no alternatives section. Let's add one.
Another option: provide a means for an external tool to trigger the kubelet to rotate its logs. That would move the policy decisions outside of the kubelet; for example, into a DaemonSet.
I wanted to suggest this, but couldn't see the page section where it would live.
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no proposal. You should add one.
@@ -0,0 +1,17 @@ | |||
title: Log rotate on Disk pressure | |||
kep-number: TBD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5032
/sig node |
He is a long rant/response to very valid point that both of you brought up. I think the source of the problem exists in the fact that kubelet is the one rotating after the longs are written onto the disk. So, no matter how hard we try, there will always be a possibility for leak. Since we are monitoring and reacting after the fact. The duration can be turned to as small a value as possible and tax kubelet, but that will also not be fool proof. The safest would be to truncate the log to size in the writer and not write the logs higher than configured size to the host. Drop everything else. But that also comes with a lot of problem. What if we end up having partial logs. That can break a lot of tooling and workflows.
We tried this out internally, and had the same behavior more or less as that of doing in the kubelet. It did give us more configure knobs and options, but the underlying matter is, we are doing the cleanup after the logs are written first. So, we are equally failure prone is it not ?
This KEP and my old PR both did part of that work. But never got it fully accurate. The crux is in how we cleanup. Say you have a heavy logging pod, that is writing 10G per second (Silly number, but hey! Where is the fun otherwise). Say your configured limit was 100M per pod with 5 files. The one file written will easily exceed all configured limits. Now the question is, how do you cleanup these files. Do you cleanup the entire file? Or part of the file ? Both of whcih can be really flaky and very opinionated. Can cause loss of logs in both cases. So, when my original change was done, I left the truncating part of the file out and cleaned up exceed files based on count alone. That fixed part of the problem But then the current issue that @Zeel-Patel mentioned happen. Now comes the part of another cleanup that we have to where we sweep across and find all exceed limits and cleanup. But this cleanup is bound by the same problem that my original change would have had to deal with. Honestly, the safest I can think of for mitigating this would be to find a mechanism to truncate at the source ensuring we do not run into partial logs being written and then kubelet can just do the count based cleanup. |
This is only safe if the process which writes the log data is also the one which rotates because it knows the message boundaries. But we removed log rotation from klog. There is now a proposal in kubernetes/kubernetes#127667 to add it back in logrunner. Perhaps something could be done with datagram stream sockets (preserve message boundaries) there. |
AIUI there is no safe way to have a process ship logs; it is always possible to have a log volume that fills either local storage or a local buffer. There's no safe means because the rate of application log writes is formally not bounded, whereas buffers and local storage are always finite. If we provide a way for container log writes to go to a socket or other file descriptor not backed by local storage, there is no need to rotate. That approach doesn't prevent log entries being dropped but it is proof against any local storage exhaustion. |
Absolutely. Couldn't agree more on this. I noticed that PR a couple of days ago but how does that help for service that do not really use that bin to run their process? For such, it is containerd that is doing the write. Correct ? https://github.com/containerd/containerd/blob/main/internal/cri/io/logger.go (is my understanding of this incorrect? ) For example, containerd exposes a knob
@sftim Most definitely, But this might not always be possible for everyone to configure even it provided as an option. I still remember the days when docker had a whole slew of plugins one can configure for log management that could help avoid writing to the local storage. Or you could create one yourself. |
I doesn't. An app has to be configured explicitly to redirect output to its own files and then not write anything to stdout/stderr... which breaks |
I'd definitely support a KEP to make container logging more extensible / pluggable. The story we have now is OK but not great. |
what do you think about this proposal? |