Skip to content

Latest commit

 

History

History
358 lines (295 loc) · 16.6 KB

File metadata and controls

358 lines (295 loc) · 16.6 KB

Caching Configuration of Mountpoint for Amazon S3 CSI Driver

Mountpoint for Amazon S3 supports caching file system metadata and object content to reduce cost and improve performance for repeated reads to the same file. The CSI Driver allows you to configure caching of Mountpoint in your PersistentVolume (PV) definition. See Mountpoint's caching configuration for more details about caching.

Metadata Cache

The metadata-ttl <SECONDS|indefinite|minimal> flag in mountOptions controls the time-to-live (TTL) for cached metadata entries. It can be set to a positive numerical value in seconds, or to one of the pre-configured values of minimal (default configuration when not using Data Cache) or indefinite (metadata entries never expire).

apiVersion: v1
kind: PersistentVolume
metadata:
  name: s3-pv
spec:
  mountOptions:
    - metadata-ttl indefinite # <SECONDS|indefinite|minimal>
  csi:
    driver: s3.csi.aws.com
    # ...

See Mountpoint's documentation for more details about metadata cache.

Data Cache

Mountpoint supports different types of data caching that you can opt in to accelerate repeated read requests.

Local Cache

The CSI Driver allows you to configure an emptyDir or a generic ephemeral volume as a local cache. The CSI Driver mounts the provided cache volume to the Mountpoint Pod and configures Mountpoint to use that volume as local cache.

See Mountpoint's documentation for more details about local cache.

emptyDir

You can specify emptyDir as cache type in your PV to use an emptyDir volume as local cache:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: s3-pv
spec:
  # ...
  csi:
    driver: s3.csi.aws.com
    # ...
    volumeAttributes:
      bucketName: amzn-s3-demo-bucket
      cache: emptyDir
      cacheEmptyDirSizeLimit: 2Gi # optional but highly recommended!
      cacheEmptyDirMedium: Memory # optional

Both cacheEmptyDirSizeLimit and cacheEmptyDirMedium are optional, but we highly recommend you specify a size limit on your cache, as it might otherwise use all your node's storage depending on the cluster's configuration. If cacheEmptyDirMedium is not specified, the default storage medium will be used.

The emptyDir will be unique to each Mountpoint Pod and won't be shared between other Mountpoint instances.

See Kubernetes's documentation for more details about emptyDir.

ephemeral

You can specify ephemeral as cache type alongside a StorageClass and storage size in your PV to use a generic ephemeral volume as local cache:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: s3-pv
spec:
  # ...
  csi:
    driver: s3.csi.aws.com
    # ...
    volumeAttributes:
      bucketName: amzn-s3-demo-bucket
      cache: ephemeral
      cacheEphemeralStorageClassName: gp2 # required
      cacheEphemeralStorageResourceRequest: 4Gi # required

The CSI Driver will create a PersistentVolumeClaim (PVC) template within the Mountpoint Pod's volumes using the configured values and ReadWriteOnce access mode to get a unique PVC created for the Mountpoint Pod. Both cacheEphemeralStorageClassName and cacheEphemeralStorageResourceRequest are required to specify a StorageClass name and a storage size to request from the StorageClass respectively.

Using the ephemeral cache type, you can use the Amazon Elastic Block Store (EBS) CSI driver to dynamically provision an EBS volume or use Local Volume Static Provisioner to access your Amazon EC2 Instance Store. See examples below for more details.

Using EBS CSI Driver to provision an EBS volume dynamically

First, make sure to install the EBS CSI Driver in your cluster by following their installation guide.

You can then create a StorageClass using the EBS CSI Driver for Mountpoint CSI Driver to request a volume to use as local cache:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: s3-cache-ebs-sc
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
parameters: # all optional, see https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/docs/parameters.md for more details
  type: io2
  iopsPerGB: "256000"
  blockExpress: "true"

You can then reference this StorageClass from your S3 PV:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: s3-pv
spec:
  # ...
  csi:
    driver: s3.csi.aws.com
    # ...
    volumeAttributes:
      bucketName: amzn-s3-demo-bucket
      cache: ephemeral
      cacheEphemeralStorageClassName: s3-cache-ebs-sc
      cacheEphemeralStorageResourceRequest: 10Gi

With this configuration, once your workload is scheduled onto a node, Mountpoint CSI Driver will schedule a Mountpoint Pod to the same node with the ephemeral volume. EBS CSI Driver will then dynamically provision an EBS volume and attach it to the node for Mountpoint to use as cache.

The EBS volume and the Mountpoint Pod (therefore its ephemeral PVC) will be automatically cleaned up once the workload is terminated. We highly recommend you use reclaimPolicy: Delete in your StorageClass to ensure the cache PV is automatically cleaned up as part of this process.

Using Local Volume Static Provisioner to use local NVMe

Some Amazon EC2 instances offer non-volatile memory express (NVMe) solid state drives (SSD) instance store volumes. You can utilize Local Volume Static Provisioner to use instance store as cache. See Instance store volume limits for EC2 instances for more details about instance store support on EC2 instances, and EKS Persistent Volumes for Instance Store on using instance storage in EKS.

The Local Volume Static Provisioner allows you to configure various options. You can find more details in their Getting started guide.

As an example, you can configure your eksctl configuration to mount available NVMe instance storage disks at /dev/disk/kubernetes:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: cluster-with-storage
  region: eu-central-1
managedNodeGroups:
  - name: storage-nvme
    desiredCapacity: 2
    instanceType: i3.8xlarge
    amiFamily: AmazonLinux2023
    preBootstrapCommands:
      - |
          cat <<EOF > /etc/udev/rules.d/90-kubernetes-discovery.rules
          # Discover Instance Storage disks so kubernetes local provisioner can pick them up from /dev/disk/kubernetes
          KERNEL=="nvme[0-9]*n[0-9]*", ENV{DEVTYPE}=="disk", ATTRS{model}=="Amazon EC2 NVMe Instance Storage", ATTRS{serial}=="?*", SYMLINK+="disk/kubernetes/nvme-\\\$attr{model}_\\\$attr{serial}", OPTIONS="string_escape=replace"
          EOF
      - udevadm control --reload && udevadm trigger

The i3.8xlarge instance type provides four NVMe instance storage disks. After applying your changes using eksctl, you can install the example EKS NVMe manifest:

$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/sig-storage-local-static-provisioner/refs/heads/master/helm/generated_examples/eks-nvme-ssd.yaml

This should create a StorageClass named nvme-ssd and eight PVs for each local NVMe instance storage disk attached to two instances (four for each instance):

$ kubectl get sc nvme-ssd
NAME       PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
nvme-ssd   kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  17s

$ kubectl get pv
NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   VOLUMEATTRIBUTESCLASS   REASON   AGE
local-pv-12305867   1769Gi     RWO            Delete           Available           nvme-ssd       <unset>                          60s
local-pv-12342524   1769Gi     RWO            Delete           Available           nvme-ssd       <unset>                          60s
local-pv-30a97d4d   1769Gi     RWO            Delete           Available           nvme-ssd       <unset>                          60s
local-pv-5a838bd7   1769Gi     RWO            Delete           Available           nvme-ssd       <unset>                          60s
local-pv-743f383d   1769Gi     RWO            Delete           Available           nvme-ssd       <unset>                          49s
local-pv-dae2484    1769Gi     RWO            Delete           Available           nvme-ssd       <unset>                          49s
local-pv-ea190b38   1769Gi     RWO            Delete           Available           nvme-ssd       <unset>                          49s
local-pv-ef5d9823   1769Gi     RWO            Delete           Available           nvme-ssd       <unset>                          49s

You can now specify StorageClass nvme-ssd in your PV's configuration with the ephemeral cache type:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: s3-pv
spec:
  # ...
  csi:
    driver: s3.csi.aws.com
    # ...
    volumeAttributes:
      bucketName: amzn-s3-demo-bucket
      cache: ephemeral
      cacheEphemeralStorageClassName: nvme-ssd
      cacheEphemeralStorageResourceRequest: 10Gi

One thing to note is that, since the local NVMe instance storage disks are local to the nodes, you need to ensure your workload and therefore the Mountpoint Pod is scheduled onto a node with local NVMe and associated PV available. You can use nodeSelector or Node affinity rules to achieve that.

For example, this configuration would ensure that your workload is scheduled on a node from the eksctl node group storage-nvme:

apiVersion: v1
kind: Pod
metadata:
  name: workload
spec:
  containers:
    # ...
  volumes:
    - name: vol
      persistentVolumeClaim:
        claimName: s3-pvc
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: alpha.eksctl.io/nodegroup-name
                operator: In
                values:
                  - storage-nvme
          # OR using node name
          # - matchExpressions:
          #     - key: kubernetes.io/hostname
          #       operator: In
          #       values:
          #         - ip-192-0-2-0.region-code.compute.internal

After deploying your workload, the Mountpoint Pod should also be deployed to the same node automatically with a local NVMe PV attached to it:

$ kubectl describe po -n mount-s3
Name:                 mp-ql5rd
Namespace:            mount-s3
...
Volumes:
  ...
  local-cache:
    Type:          EphemeralVolume (an inline specification for a volume that gets created and deleted with the pod)
    StorageClass:  nvme-ssd
    Volume:
    Labels:            s3.csi.aws.com/type=local-ephemeral-cache
    Annotations:       <none>
    Capacity:
    Access Modes:
    VolumeMode:    Filesystem

$ kubectl describe pvc -n mount-s3
Name:          mp-xt6c4-local-cache
Namespace:     mount-s3
StorageClass:  nvme-ssd
Status:        Bound
Volume:        local-pv-743f383d
Labels:        s3.csi.aws.com/type=local-ephemeral-cache
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1769Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       mp-ql5rd

Note that if there is no local NVMe available in the scheduled node, the Mountpoint Pod would fail to schedule and your workload would hang in Pending state. You can kubectl describe pods -n mount-s3 to describe your Mountpoint Pod to see if it has any unsatisfied deployment requirements. The CSI Driver would emit an helpful error message for you to check your Mountpoint Pod's status in this case.

You must ensure your workload (and therefore the Mountpoint Pod) is scheduled to a node with local NVMe available to use.

Ensure you check other configurations of Local Volume Static Provisioner including Local Volume Node Cleanup Controller for volume cleanup and other details.

(Deprecated) cache flag via mountOptions

With the CSI Driver v1, the Mountpoint instances were spawned on the host using systemd, and the cache flag in mountOptions was a relative path to the host. The cache folder also needed to exist for Mountpoint to use. We have deprecated this usage and will fallback to using emptyDir with the default storage medium without any limit by default.

You no longer need to create a cache folder on the host, and the configured path will be ignored by the CSI Driver v2! We recommend customers migrate to emptyDir and specify a limit.

For this deprecated use of the cache configuration:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: s3-pv
spec:
  mountOptions:
    - cache /cache/folder/on/host
  csi:
    driver: s3.csi.aws.com
    # ...
    volumeAttributes:
      bucketName: amzn-s3-demo-bucket

the CSI Driver will ignore the cache path and will create an emptyDir cache volume instead. The end result will be the same as this configuration:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: s3-pv
spec:
  # ...
  csi:
    driver: s3.csi.aws.com
    # ...
    volumeAttributes:
      bucketName: amzn-s3-demo-bucket
      cache: emptyDir

Shared Cache

When mounting an S3 bucket, you can opt in to a shared cache in Amazon S3 Express One Zone. You should use the shared cache if you repeatedly read small objects (up to 1 MB) from multiple compute instances, or the size of the dataset that you repeatedly read often exceeds the size of your local cache. This improves latency when reading the same data repeatedly from multiple instances by avoiding redundant requests to your mounted S3 bucket. To enable shared cache, specify the cache-xz flag in mountOptions with your directory bucket name:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: s3-pv
spec:
  mountOptions:
    - cache-xz amzn-s3-demo-bucket--usw2-az1--x-s3
  csi:
    driver: s3.csi.aws.com
    # ...

See Mountpoint's documentation for more details about shared cache.

Combined Local and Shared Cache

You can opt in to a local cache and shared cache together if you have unused space on your instance, but also want to share the cache across multiple instances. This avoids redundant read requests from the same instance to the shared cache in S3 directory bucket when the required data is cached in local storage, reducing request cost as well as improving performance. To opt in to local and shared cache together, you can specify both the Local Cache and Shared Cache in your PV:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: s3-pv
spec:
  mountOptions:
    - cache-xz amzn-s3-demo-bucket--usw2-az1--x-s3
  csi:
    driver: s3.csi.aws.com
    # ...
    volumeAttributes:
      bucketName: amzn-s3-demo-bucket
      cache: emptyDir
      cacheEmptyDirSizeLimit: 2Gi

See Mountpoint's documentation for more details about combined local and shared cache.