Skip to content

Race when creating azure-vnet lock directory with permissions #2818

Open
@QxBytes

Description

@QxBytes

What happened:
On the first boot, no CNI binary is on the node, and so k8s creates the /var/run/azure-vnet directory with 0755 permissions automatically because it is a mount part of the azure-cns daemonset. Then the CNI is deployed.
The /var/run directory is not preserved between reboots.
Then, when the VM reboots, the CNI binary may run before k8s creates the /var/run/azure-vnet directory. When the CNI binary runs first, it creates the directory with 0644 permissions. This causes permission denied errors for the cns. Even if k8s creates/mounts the /var/run/azure-vnet directory later, it will see it already exists and won't recreate the directory with the 0755 permissions.

What you expected to happen:
The CNI binary should create the directory with 0755 permissions.

How to reproduce it:
Reboot the VM with the cns capabilities security context dropping all capabilities (so it doesn't bypass permission checks). There is a chance that the azure-cns pod will get stuck in crash loop backoff.

Orchestrator and Version (e.g. Kubernetes, Docker):

Operating System (Linux/Windows):

Kernel (e.g. uanme -a for Linux or $(Get-ItemProperty -Path "C:\windows\system32\hal.dll").VersionInfo.FileVersion for Windows):

Anything else we need to know?:
[Miscellaneous information that will assist in solving the issue.]

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

bugcniRelated to CNI.exempt-staleKeep this freshneeds-backportChange needs to be backported to previous release trains

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions