Skip to content

'Apply/ReplaceResource' in resource_ops.go may leak files to '/dev/shm' since the kubectl 'apply/replace' commands never time out #572

Open
@jgwest

Description

gitops-engine directly calls kubectl command code to create/apply/replace/delete K8s resources on the cluster. This ensures that the logic used by gitops-engine consumers (such as Argo CD) interacts with those K8s resources in a way that is compatible to kubectl.

However, at present, gitops-engine does not specify a timeout value for 'kubectl create/apply/replace' commands.

This means that in rare cases (such as cluster/network issues), the kubectl operation will remaining running forever, waiting for an I/O operation that may never complete.

Normally this would just be a small memory leak (i.e. not necessarily the end of the world), however, in order to call the kubectl command code, gitops-engine writes manifest files to '/dev/shm', which are then passed via the '-f' file option to kubectl.

This means that those long-running I/O operations are also leaking K8s manifest files to /dev/shm: the K8s manifest files must remain in '/dev/shm' while the I/O operation is in progress. '/dev/shm' appears limited to 64MB, which can fill quickly.

  • When examining the contents of /dev/shm from users that have reported this issue, we see a large number of miscellanous manifests that are hours or days old (dating back to the lasted Pod restart).

The proposed solution (PR attached) is to add a long default timeout to calls to kubectl's apply command.

Related: #568

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions