Description
In the Kubernetes/CSI scenario, users need to use CSI to provide storage volumes to regular containers. After CSI completes the relevant operations, kubelet provides a HostPath and ContainerPath, but does not include a Mount Type. In other words, the Container Mount information that Containerd can retrieve through CRI does not include a type. When converting to the OCI Spec, the Mount type is set to "bind" by default.
Unfortunately,this solution does not work properly in the Kata Containers 3.0/runtime-rs with Direct Volume scenario.
When using a direct volume, Kata Pod needs to determine the type of DirectVolume based on Mount.Type after obtaining mount information. Then, it needs to choose the appropriate way to insert the backend device into the Guest, such as directvol, vfiovol, or spdkvol. Finally, the mount operation is completed in the Guest.
Kata3.0/runtime-rs using direct volume with containerd ctr, more details can be seen how-to-run-kata-containers-with-kinds-of-Block-Volumes
An example as below:
$ cat ./kubelet/kata-direct-vol-002/directvol002/mountInfo.json
{
"device": "/tmp/stor/rawdisk01.20g",
"volume_type": "directvol",
"fs_type": "ext4",
"metadata":"{}",
"options": []
}
$ sudo ctr run -t --rm --runtime io.containerd.kata.v2 --mount type=directvol,src=/kubelet/kata-direct-vol-002/directvol002,dst=/disk002,options=rbind:rw "$image" kata-direct-vol-xx05302045 /bin/bash
However, there are certain differences between ordinary containers and Kata Containers when using CSI to access storage. The Mount Type in the OCI Spec provided by K8S/Containerd is by default the bind type. Ordinary containers will use the bind type by default, but Kata/DirectVolume requires a specific type, such as "directvol" (or other types like vfiovol or spdkvol). Therefore, in the K8S/CSI scenario, Kata Containers wants the Mount in its OCI Spec to contain a specific volume type instead of the bind type when creating a container that uses a Direct Volume. In other words, it is necessary to change the default bind type to the specific volume type at the right time, and then pass it to the Kata runtime-rs.
Although CDI can edit the OCI Spec of Container third-party devices in K8S/DevicePlugin using Container Annotation/CDIDevice information, it can also be extended to CSI to increase the editing ability of Container Mount.
In the K8S/CSI/Kata-runtime-rs scenario using Direct Volume, we hope that CDI can automatically edit a Mount in the OCI Spec according to the Direct Volume type, that is, to change the original "bind" to the "directvol" of the corresponding volume_type in the mountInfo.json instance.
According to the idea, I have already made the initial design and implementation of it, the scheme is as follows.
- CSI
(1) Do kata-ctl direct-volume add
DoAddDirectVolume(mount.HostPath)
(2) Do cdi config generation in /var/run/cdi/xxx.json
DoGenerateCDIConfig(mount.HostPath, mount.ContainerPath) - containerd
Set the special vend/class for direct volume todirect.volume/direct-volume
.
(1) Add the parameter CDIMounts []*runtime.Mount to the WithCDI function.
(2) Use the base64-encoded string of Mount.HostPath as the deviceName to build a mount device that conforms to the CDI specification.
encodedMntpath := DoEncodeBase64(mount.HostPath)
mountDevice := "direct.volume/direct-volume=" + encodedMntpath
The cdi config for Mount as below:
# cat /var/run/cdi/kata-cdi-L3Zhci9saWIv.json
{
"cdiVersion": "0.6.0",
"kind": "direct.volume/direct-volume",
"devices": [
{
"name": "L3Zhci9saWIva3ViZWxldC9wb2RzLzBhNmY5OGQ4LWM4YjgtNDgwOS05ZTZlLTEwMWJlNTA0MTM2MS92b2x1bWVzL2t1YmVybmV0ZXMuaW9+ZW1wdHktZGlyL2NkaS1kaXJlY3QteHZvbHg5MDAxMA==",
"containerEdits": {
"mounts": [
{
"hostPath": "/var/lib/kubelet/pods/0a6f98d8-c8b8-4809-9e6e-101be5041361/volumes/kubernetes.io~csi/cdi-direct-xvolx90010",
"containerPath": "/disk01",
"options": [
"rbind",
"rw"
],
"type": "directvol"
}
]
}
}
]
}
And the draft code as below:
// WithCDI updates OCI spec with CDI content
- func WithCDI(annotations map[string]string, CDIDevices []*runtime.CDIDevice) oci.SpecOpts {
+ func WithCDI(annotations map[string]string, CDIDevices []*runtime.CDIDevice, CDIMounts []*runtime.Mount) oci.SpecOpts {
return func(ctx context.Context, client oci.Client, c *containers.Container, s *oci.Spec) error {
seen := make(map[string]bool)
// Add devices from CDIDevices CRI field
var devices []string
var err error
...
+for _, mount := range CDIMounts {
+ // encoded mount path
+ encodedMntpath := DoEncodeBase64(mount.HostPath)
+ mntDevice := "direct.volume/direct-volume=" + encodedMntpath
+ if seen[mntDevice] {
+ log.G(ctx).Debugf("Skipping duplicated CDI device %s", mntDevice)
+ continue
+ }
+ devices = append(devices, mntDevice)
+ seen[mntDevice] = true
+}
...
return oci.WithCDIDevices(devices...)(ctx, client, c, s)
}
}
- CDI
(1) To better support using the base64-URLencoded string of the Mount HostPath as the DeviceName, it is necessary to modify the CDI DeviceName naming check rules to support special symbol =
.
(2) As the runtime.Mount object in containerd does not have any additional information to help accurately filter out which HostPath is associated with the direct volume, all Mount.Hostpath objects are allowed to be used to construct DeviceNames and passed to CDI. Then it uses DeviceNames to match cdi Cache database devices. However, the final result is that there is only one Mount that needs to be modified for the direct volume. Therefore, the CDI InjectDevice function needs to be modified to not return an error if there is an unresolved device name.
$git diff container-device-interface/pkg/cdi/cache.go
diff --git a/container-device-interface/pkg/cdi/cache.go b/container-device-interface/pkg/cdi/cache.go
index cb495ebb3..01af15deb 100644
--- a/container-device-interface/pkg/cdi/cache.go
+++ b/container-device-interface/pkg/cdi/cache.go
@@ -243,15 +243,14 @@ func (c *Cache) InjectDevices(ociSpec *oci.Spec, devices ...string) ([]string, e
}
if unresolved != nil {
- return unresolved, fmt.Errorf("unresolvable CDI devices %s",
- strings.Join(devices, ", "))
+ fmt.Printf("unresolvable CDI devices %s", strings.Join(devices, ", "))
}
if err := edits.Apply(ociSpec); err != nil {
return nil, fmt.Errorf("failed to inject devices: %w", err)
}
- return nil, nil
+ return unresolved, nil
}