Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-5032: Container log Split and Rotation to avoid Disk pressure #5022

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
319 changes: 319 additions & 0 deletions keps/sig-node/5032-log-eviction-on-disk-pressure/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,319 @@
# KEP-5032: Container log Split and Rotate to avoid Disk perssure

<!-- toc -->
- [Release Signoff Checklist](#release-signoff-checklist)
- [Summary](#summary)
- [Motivation](#motivation)
- [What would you like to be added?](#what-would-you-like-to-be-added)
- [Why is this needed?](#why-is-this-needed)
- [Spec 1](#spec-1)
- [Spec 2](#spec-2)
- [Goals](#goals)
- [Proposal](#proposal)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
- [Prerequisite testing updates](#prerequisite-testing-updates)
- [Unit tests](#unit-tests)
- [Integration tests](#integration-tests)
- [e2e tests](#e2e-tests)
- [Graduation Criteria](#graduation-criteria)
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
- [Monitoring Requirements](#monitoring-requirements)
- [Dependencies](#dependencies)
- [Scalability](#scalability)
- [Troubleshooting](#troubleshooting)
- [Implementation History](#implementation-history)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
- [Only Rotation on disk pressure](#only-rotation-on-disk-pressure)
- [DaemonSet to cleanup logs](#daemonset-to-cleanup-logs)
<!-- /toc -->

## Release Signoff Checklist

Items marked with (R) are required *prior to targeting to a milestone / release*.

- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
- [ ] (R) Design details are appropriately documented
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- [ ] e2e Tests for all Beta API Operations (endpoints)
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
- [ ] (R) Graduation criteria is in place
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
- [ ] (R) Production readiness review completed
- [ ] (R) Production readiness review approved
- [ ] "Implementation History" section is up-to-date for milestone
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes


[kubernetes.io]: https://kubernetes.io/
[kubernetes/enhancements]: https://git.k8s.io/enhancements
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
[kubernetes/website]: https://git.k8s.io/website

## Summary

Split, Clean and Rotate container logs to avoid disk pressure on kubelet host.

## Motivation

- We manage kubernetes ecosystem at our organization. A lot of our kubelet hosts experienced Disk pressure as a certain set of pods was generating very high logs. The rate was around 3-4Gib logs generated in 15 minutes. We had kubelet configs containerLogMaxSize set to 200Mib and containerLogMaxFiles set to 6. But the .gz files (tar log files of pods) were of size around 500-600Gib. We observed that container log rotation was slow for us.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very specific and doesn't represent Kubernetes as an organization. @Zeel-Patel you should reword this motivation to cover the Kubernetes project motivation to make this change.


### What would you like to be added?

We expect that the log file size is always under set kubelet config limit (i.e., containerLogMaxSize) which can help such disk pressure issues in the future.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this paragraph, who does “we“ refer to?


### Why is this needed?

It often happens that the containers generating heavy log data have compressed log file with size exceeding the containerLogMaxSize limit set in kubelet config.

For example, kubelet has
```
containerLogMaxSize = 200M
containerLogMaxFiles = 6
```

### Spec 1

Continuously generating 10Mib with 0.1 sec sleep in between
Comment on lines +80 to +82
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Spec 1
Continuously generating 10Mib with 0.1 sec sleep in between
### Example 1
This manifest defines a Job that continuously generates 10MiB of logs with 0.1 sec sleep in between (100MiB/s):

```
apiVersion: batch/v1
kind: Job
metadata:
name: generate-huge-logs
spec:
template:
spec:
containers:
- name: log-generator
image: busybox
command: ["/bin/sh", "-c"]
args:
- |
# Generate huge log entries to stdout
start_time=$(date +%s)
log_size=0
target_size=$((4 * 1024 * 1024 * 1024)) # 4 GB target size in bytes
while [ $log_size -lt $target_size ]; do
# Generate 1 MB of random data and write it to stdout
echo "Generating huge log entry at $(date) - $(dd if=/dev/urandom bs=10M count=1 2>/dev/null)"
log_size=$(($log_size + 1048576)) # Increment size by 1MB
sleep 0.1 # Sleep to control log generation speed
done
end_time=$(date +%s)
echo "Log generation completed in $((end_time - start_time)) seconds"
restartPolicy: Never
backoffLimit: 4
```
File sizes
```
-rw-r----- 1 root root 24142862 Jan 1 11:41 0.log
-rw-r--r-- 1 root root 183335398 Jan 1 11:40 0.log.20250101-113948.gz
-rw-r--r-- 1 root root 364144934 Jan 1 11:40 0.log.20250101-114003.gz
-rw-r--r-- 1 root root 487803789 Jan 1 11:40 0.log.20250101-114023.gz
-rw-r--r-- 1 root root 577188544 Jan 1 11:41 0.log.20250101-114047.gz
-rw-r----- 1 root root 730449620 Jan 1 11:41 0.log.20250101-114115
```

### Spec 2

Continuously generating 10Mib with 10 sec sleep in between
Comment on lines +122 to +124
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Spec 2
Continuously generating 10Mib with 10 sec sleep in between
### Example 2
This manifest defines a Job that continuously generates 10MiB of logs with 10 sec sleep in between (1MiB/s mean):

```
apiVersion: batch/v1
kind: Job
metadata:
name: generate-huge-logs
spec:
template:
spec:
containers:
- name: log-generator
image: busybox
command: ["/bin/sh", "-c"]
args:
- |
# Generate huge log entries to stdout
start_time=$(date +%s)
log_size=0
target_size=$((4 * 1024 * 1024 * 1024)) # 4 GB target size in bytes
while [ $log_size -lt $target_size ]; do
# Generate 1 MB of random data and write it to stdout
echo "Generating huge log entry at $(date) - $(dd if=/dev/urandom bs=10M count=1 2>/dev/null)"
log_size=$(($log_size + 1048576)) # Increment size by 1MB
sleep 0.1 # Sleep to control log generation speed
done
end_time=$(date +%s)
echo "Log generation completed in $((end_time - start_time)) seconds"
restartPolicy: Never
backoffLimit: 4
```

File sizes
```
-rw-r----- 1 root root 181176268 Jan 1 11:31 0.log
-rw-r--r-- 1 root root 183336647 Jan 1 11:20 0.log.20250101-111730.gz
-rw-r--r-- 1 root root 183323382 Jan 1 11:23 0.log.20250101-112026.gz
-rw-r--r-- 1 root root 183327676 Jan 1 11:26 0.log.20250101-112321.gz
-rw-r--r-- 1 root root 183336376 Jan 1 11:29 0.log.20250101-112616.gz
-rw-r----- 1 root root 205360966 Jan 1 11:29 0.log.20250101-112911
```


If the pod had been generating logs in Gigabytes with minimal delay, it can cause disk pressure on kubelet host and that can affect other pods running in the same kubelet.

### Goals

- There is a ContainerLogManager in every kubelet. It runs an infinite go routine and checks active log file(on which all container log read write happens) size. If that exceeds the above mentioned limit (containerLogMaxSize), it starts parallel workers. Each worker creates a tar of the active file, deletes tars till there are containerLogMaxFiles files in total and creates a new active file for container.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't sound like a goal, maybe that belongs in the Proposal section.

- Goal is to Split large active log file in size containerLogMaxSize, and then do the rest of the operations done by containerLogManager.

## Proposal
- The container log rotation working now shpuld work as is, but it will ensure that before rotating file, it is under the size limit set. This way, every tar present for a container under host will surely be under containerLogMaxSize. This can avoid disk pressure on the host.

### Risks and Mitigations

Do not see any risk as of now.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure?


## Design Details

1. Implement a new function (splitAndRotateLatestLog) to be called by rotateLog function (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/logs/container_log_manager.go#L313-L346)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Implement a new function (splitAndRotateLatestLog) to be called by rotateLog function (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/logs/container_log_manager.go#L313-L346)
1. Implement a new function (`splitAndRotateLatestLog()`) to be called by the `rotateLog` function (https://github.com/kubernetes/kubernetes/blob/9e555875e79f60a238952e91e91f79f32c053f9c/pkg/kubelet/logs/container_log_manager.go#L313-L346)
2. ```

2. The rotateLog is being called by each worker for the container assigned to it.
3. It does cleanup to delete all original files for which compressed files are present and .tmp files generated (and if not deleted) in last log rotation.
4. It then deletes oldest rotated files till containerLogMaxFiles-2 files are left. This is because the non rotated active file will be rotated and new active file will be created. Which will add upto containerLogMaxFiles.
5. Then it compresses un compressed files (it does not compress the active file) and then rotates active file.
6. Before doing step 4, in the new design, it will split the large active log file in size of containerLogMaxSize and name them \<active-log-file-name\>.part\<i\>. And rotate parts.
7. Let's say it created n parts, after this, it can do rotate of n-1 parts, keeping last nth part active and do delete+compress. (Basically step 4 and 5)

### Test Plan

[X] I/we understand the owners of the involved components may require updates to
existing tests to make this code solid enough prior to committing the changes necessary
to implement this enhancement.

##### Prerequisite testing updates

##### Unit tests
- Add detailed unit tests with 100% coverage.
- `<package>`: `<date>` - `<test coverage>`

##### Integration tests
- Scenarios will be covered in e2e tests.

##### e2e tests
- Add test under `kubernetes/test/e2e_node`.
- Set low value for `containerLogMaxSize` and `containerLogMaxFiles`.
- Create a pod with generating heavy logs and expect the container's combined log size to be within `containerLogMaxSize`*`containerLogMaxFiles`.

### Graduation Criteria

**Note:** *Not required until targeted at a release.*


###### How can this feature be enabled / disabled in a live cluster?

- [X] Other
- Describe the mechanism:
- Will enabling / disabling the feature require downtime of the control
plane? Yes (kubelet restart)
- Will enabling / disabling the feature require downtime or reprovisioning
of a node? No, restart of kubelet with updated configurations and version should work.

###### Does enabling the feature change any default behavior?
No
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true?


###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
Yes

###### What happens if we reenable the feature if it was previously rolled back?


Comment on lines +231 to +232
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to answer this.

###### Are there any tests for feature enablement/disablement?
Add UTs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Add UTs.
- Unit tests will be added for alpha

?


### Rollout, Upgrade and Rollback Planning

###### How can a rollout or rollback fail? Can it impact already running workloads?
No identified risk.

###### What specific metrics should inform a rollback?
No identified risk.

###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
e2e tests covered.

###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
No

### Monitoring Requirements

###### How can an operator determine if the feature is in use by workloads?
Emit cleanup logs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't explain how an operator can tell whether the feature is in use.


###### How can someone using this feature know that it is working for their instance?
Yes, from logs.

###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
Na

###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
NA

###### Are there any missing metrics that would be useful to have to improve observability of this feature?
NA

### Dependencies

###### Does this feature depend on any specific services running in the cluster?
No

### Scalability

###### Will enabling / using this feature result in any new API calls?
No

###### Will enabling / using this feature result in introducing new API types?
No

###### Will enabling / using this feature result in any new calls to the cloud provider?

###### Will enabling / using this feature result in increasing size or count of the existing API objects?
No

###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No

###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
CPU cycles usage of ContainerLogManager of kubelet will increase.

###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No

### Troubleshooting

###### How does this feature react if the API server and/or etcd is unavailable?

###### What are other known failure modes?
NA

###### What steps should be taken if SLOs are not being met to determine the problem?
NA

## Implementation History
NA

## Drawbacks
No identified drawbacks.

## Alternatives

### Only Rotation on disk pressure
Define 2 new flags `logRotateDiskCheckInterval`, `logRotateDiskPressureThreshold` in kubelet config.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit)

Suggested change
Define 2 new flags `logRotateDiskCheckInterval`, `logRotateDiskPressureThreshold` in kubelet config.
Define 2 new parameters `logRotateDiskCheckInterval`, `logRotateDiskPressureThreshold` in kubelet config.


- `logRotateDiskCheckInterval` is the time interval within which the ContainerLogManager will check Disk usage on the kubelet host.
- `logRotateDiskPressureThreshold` is the threshold of overall Disk usage on the kubelet. If actual Disk usage is equal or more than this threshold, it will rotate logs of all the containers of the kubelet.

### DaemonSet to cleanup logs
Provide a means for an external tool to trigger the kubelet to rotate its logs. That would move the policy decisions outside of the kubelet, for example, into a DaemonSet.
18 changes: 18 additions & 0 deletions keps/sig-node/5032-log-eviction-on-disk-pressure/kep.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
title: Log rotate on Disk pressure
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match the issue, try:

Suggested change
title: Log rotate on Disk pressure
title: Container log Split and Rotate to avoid Disk pressure

or

retitle the issue

kep-number: 5032
authors:
- "@Zeel-Patel"
- "@rishabh325"
owning-sig: sig-node
status: provisional
editor: "@Zeel-Patel"
creation-date: 2025-01-08
last-updated: 2025-01-08
reviewers:
- "@kannon92"
- "@ffromani"
- "@harshanarayana"
- "@leonzz"
approvers:
- TBD
latest-milestone: TBD