Skip to content

feature: PoC to mknod devices for user namespace containers #5137

@everzakov

Description

@everzakov

The Usernamespace is GA in Kubernetes since 1.36. However, it does not support raw block devices.
If I understand correctly, the reason is that we can not call mknod in non initial user namespace that's why bind command is used instead.

However, there are some limitations:

  1. The device should already exist in the host with the same name.
  2. The device's characteristic is the same as host device. e.g. the owner uid and gid is from initial user namespace that's why we have nobody/nogroup in the device's description.
  3. The device should have permission for other users because host user and container user are not equal. Otherwise, we have Permission denied error. For this reason we can not pass device with 0660 permission.

There are some ways to solve these limitations:

  1. Allow to call mknod in the non initial user namespace. This is the restriction from Kernel.
  2. Add permission for other users to use the device. However, this is a security issue because all users will have the rights to use the device.
  3. Chown the parent device to map owner id to container's user namespace. However, we can not reuse the same device within several pods (they have diffirent user namespaces).
  4. Support idmap for devtmpfs. Now idmap is supported only for several filesystems.
  5. Call mknod in the initial user namespace and container's mount namespace to pass the device to the container.

For Poc the 5 option (call mknod instead of bind) was used because other solutions require Kernel changes.
Current PoC limitations are all about criu:

  1. The devices major/minor should be the same for checkpoint/restore.
  2. The user namespace info (host id, container id, length) should be the same for checkpoint/restore.
  3. Only simple mount scenarios are checked (e.g. not checked if user will mount container dev to some path).

I will be glad to hear some feedback :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions