Mountpoint for Amazon S3 supports caching file system metadata and object content to reduce cost and improve performance for repeated reads to the same file. The CSI Driver allows you to configure caching of Mountpoint in your PersistentVolume (PV) definition. See Mountpoint's caching configuration for more details about caching.
The metadata-ttl <SECONDS|indefinite|minimal> flag in mountOptions controls the time-to-live (TTL) for cached metadata entries. It can be set to a positive numerical value in seconds, or to one of the pre-configured values of minimal (default configuration when not using Data Cache) or indefinite (metadata entries never expire).
apiVersion: v1
kind: PersistentVolume
metadata:
name: s3-pv
spec:
mountOptions:
- metadata-ttl indefinite # <SECONDS|indefinite|minimal>
csi:
driver: s3.csi.aws.com
# ...See Mountpoint's documentation for more details about metadata cache.
Mountpoint supports different types of data caching that you can opt in to accelerate repeated read requests.
The CSI Driver allows you to configure an emptyDir or a generic ephemeral volume as a local cache. The CSI Driver mounts the provided cache volume to the Mountpoint Pod and configures Mountpoint to use that volume as local cache.
See Mountpoint's documentation for more details about local cache.
You can specify emptyDir as cache type in your PV to use an emptyDir volume as local cache:
apiVersion: v1
kind: PersistentVolume
metadata:
name: s3-pv
spec:
# ...
csi:
driver: s3.csi.aws.com
# ...
volumeAttributes:
bucketName: amzn-s3-demo-bucket
cache: emptyDir
cacheEmptyDirSizeLimit: 2Gi # optional but highly recommended!
cacheEmptyDirMedium: Memory # optionalBoth cacheEmptyDirSizeLimit and cacheEmptyDirMedium are optional, but we highly recommend you specify a size limit on your cache, as it might otherwise use all your node's storage depending on the cluster's configuration. If cacheEmptyDirMedium is not specified, the default storage medium will be used.
The emptyDir will be unique to each Mountpoint Pod and won't be shared between other Mountpoint instances.
See Kubernetes's documentation for more details about emptyDir.
You can specify ephemeral as cache type alongside a StorageClass and storage size in your PV to use a generic ephemeral volume as local cache:
apiVersion: v1
kind: PersistentVolume
metadata:
name: s3-pv
spec:
# ...
csi:
driver: s3.csi.aws.com
# ...
volumeAttributes:
bucketName: amzn-s3-demo-bucket
cache: ephemeral
cacheEphemeralStorageClassName: gp2 # required
cacheEphemeralStorageResourceRequest: 4Gi # requiredThe CSI Driver will create a PersistentVolumeClaim (PVC) template within the Mountpoint Pod's volumes using the configured values and ReadWriteOnce access mode to get a unique PVC created for the Mountpoint Pod.
Both cacheEphemeralStorageClassName and cacheEphemeralStorageResourceRequest are required to specify a StorageClass name and a storage size to request from the StorageClass respectively.
Using the ephemeral cache type, you can use the Amazon Elastic Block Store (EBS) CSI driver to dynamically provision an EBS volume or use Local Volume Static Provisioner to access your Amazon EC2 Instance Store. See examples below for more details.
First, make sure to install the EBS CSI Driver in your cluster by following their installation guide.
You can then create a StorageClass using the EBS CSI Driver for Mountpoint CSI Driver to request a volume to use as local cache:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: s3-cache-ebs-sc
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
parameters: # all optional, see https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/docs/parameters.md for more details
type: io2
iopsPerGB: "256000"
blockExpress: "true"You can then reference this StorageClass from your S3 PV:
apiVersion: v1
kind: PersistentVolume
metadata:
name: s3-pv
spec:
# ...
csi:
driver: s3.csi.aws.com
# ...
volumeAttributes:
bucketName: amzn-s3-demo-bucket
cache: ephemeral
cacheEphemeralStorageClassName: s3-cache-ebs-sc
cacheEphemeralStorageResourceRequest: 10GiWith this configuration, once your workload is scheduled onto a node, Mountpoint CSI Driver will schedule a Mountpoint Pod to the same node with the ephemeral volume. EBS CSI Driver will then dynamically provision an EBS volume and attach it to the node for Mountpoint to use as cache.
The EBS volume and the Mountpoint Pod (therefore its ephemeral PVC) will be automatically cleaned up once the workload is terminated. We highly recommend you use reclaimPolicy: Delete in your StorageClass to ensure the cache PV is automatically cleaned up as part of this process.
Some Amazon EC2 instances offer non-volatile memory express (NVMe) solid state drives (SSD) instance store volumes. You can utilize Local Volume Static Provisioner to use instance store as cache. See Instance store volume limits for EC2 instances for more details about instance store support on EC2 instances, and EKS Persistent Volumes for Instance Store on using instance storage in EKS.
The Local Volume Static Provisioner allows you to configure various options. You can find more details in their Getting started guide.
As an example, you can configure your eksctl configuration to mount available NVMe instance storage disks at /dev/disk/kubernetes:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: cluster-with-storage
region: eu-central-1
managedNodeGroups:
- name: storage-nvme
desiredCapacity: 2
instanceType: i3.8xlarge
amiFamily: AmazonLinux2023
preBootstrapCommands:
- |
cat <<EOF > /etc/udev/rules.d/90-kubernetes-discovery.rules
# Discover Instance Storage disks so kubernetes local provisioner can pick them up from /dev/disk/kubernetes
KERNEL=="nvme[0-9]*n[0-9]*", ENV{DEVTYPE}=="disk", ATTRS{model}=="Amazon EC2 NVMe Instance Storage", ATTRS{serial}=="?*", SYMLINK+="disk/kubernetes/nvme-\\\$attr{model}_\\\$attr{serial}", OPTIONS="string_escape=replace"
EOF
- udevadm control --reload && udevadm triggerThe i3.8xlarge instance type provides four NVMe instance storage disks. After applying your changes using eksctl, you can install the example EKS NVMe manifest:
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/sig-storage-local-static-provisioner/refs/heads/master/helm/generated_examples/eks-nvme-ssd.yamlThis should create a StorageClass named nvme-ssd and eight PVs for each local NVMe instance storage disk attached to two instances (four for each instance):
$ kubectl get sc nvme-ssd
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nvme-ssd kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 17s
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE
local-pv-12305867 1769Gi RWO Delete Available nvme-ssd <unset> 60s
local-pv-12342524 1769Gi RWO Delete Available nvme-ssd <unset> 60s
local-pv-30a97d4d 1769Gi RWO Delete Available nvme-ssd <unset> 60s
local-pv-5a838bd7 1769Gi RWO Delete Available nvme-ssd <unset> 60s
local-pv-743f383d 1769Gi RWO Delete Available nvme-ssd <unset> 49s
local-pv-dae2484 1769Gi RWO Delete Available nvme-ssd <unset> 49s
local-pv-ea190b38 1769Gi RWO Delete Available nvme-ssd <unset> 49s
local-pv-ef5d9823 1769Gi RWO Delete Available nvme-ssd <unset> 49sYou can now specify StorageClass nvme-ssd in your PV's configuration with the ephemeral cache type:
apiVersion: v1
kind: PersistentVolume
metadata:
name: s3-pv
spec:
# ...
csi:
driver: s3.csi.aws.com
# ...
volumeAttributes:
bucketName: amzn-s3-demo-bucket
cache: ephemeral
cacheEphemeralStorageClassName: nvme-ssd
cacheEphemeralStorageResourceRequest: 10GiOne thing to note is that, since the local NVMe instance storage disks are local to the nodes,
you need to ensure your workload and therefore the Mountpoint Pod is scheduled onto a node with local NVMe and associated PV available.
You can use nodeSelector or Node affinity rules to achieve that.
For example, this configuration would ensure that your workload is scheduled on a node from the eksctl node group storage-nvme:
apiVersion: v1
kind: Pod
metadata:
name: workload
spec:
containers:
# ...
volumes:
- name: vol
persistentVolumeClaim:
claimName: s3-pvc
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: alpha.eksctl.io/nodegroup-name
operator: In
values:
- storage-nvme
# OR using node name
# - matchExpressions:
# - key: kubernetes.io/hostname
# operator: In
# values:
# - ip-192-0-2-0.region-code.compute.internalAfter deploying your workload, the Mountpoint Pod should also be deployed to the same node automatically with a local NVMe PV attached to it:
$ kubectl describe po -n mount-s3
Name: mp-ql5rd
Namespace: mount-s3
...
Volumes:
...
local-cache:
Type: EphemeralVolume (an inline specification for a volume that gets created and deleted with the pod)
StorageClass: nvme-ssd
Volume:
Labels: s3.csi.aws.com/type=local-ephemeral-cache
Annotations: <none>
Capacity:
Access Modes:
VolumeMode: Filesystem
$ kubectl describe pvc -n mount-s3
Name: mp-xt6c4-local-cache
Namespace: mount-s3
StorageClass: nvme-ssd
Status: Bound
Volume: local-pv-743f383d
Labels: s3.csi.aws.com/type=local-ephemeral-cache
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 1769Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: mp-ql5rdNote that if there is no local NVMe available in the scheduled node, the Mountpoint Pod would fail to schedule and your workload would hang in Pending state. You can kubectl describe pods -n mount-s3 to describe your Mountpoint Pod to see if it has any unsatisfied deployment requirements. The CSI Driver would emit an helpful error message for you to check your Mountpoint Pod's status in this case.
You must ensure your workload (and therefore the Mountpoint Pod) is scheduled to a node with local NVMe available to use.
Ensure you check other configurations of Local Volume Static Provisioner including Local Volume Node Cleanup Controller for volume cleanup and other details.
With the CSI Driver v1, the Mountpoint instances were spawned on the host using systemd, and the cache flag in mountOptions was a relative path to the host. The cache folder also needed to exist for Mountpoint to use. We have deprecated this usage and will fallback to using emptyDir with the default storage medium without any limit by default.
You no longer need to create a cache folder on the host, and the configured path will be ignored by the CSI Driver v2! We recommend customers migrate to emptyDir and specify a limit.
For this deprecated use of the cache configuration:
apiVersion: v1
kind: PersistentVolume
metadata:
name: s3-pv
spec:
mountOptions:
- cache /cache/folder/on/host
csi:
driver: s3.csi.aws.com
# ...
volumeAttributes:
bucketName: amzn-s3-demo-bucketthe CSI Driver will ignore the cache path and will create an emptyDir cache volume instead. The end result will be the same as this configuration:
apiVersion: v1
kind: PersistentVolume
metadata:
name: s3-pv
spec:
# ...
csi:
driver: s3.csi.aws.com
# ...
volumeAttributes:
bucketName: amzn-s3-demo-bucket
cache: emptyDirWhen mounting an S3 bucket, you can opt in to a shared cache in Amazon S3 Express One Zone. You should use the shared cache if you repeatedly read small objects (up to 1 MB) from multiple compute instances, or the size of the dataset that you repeatedly read often exceeds the size of your local cache. This improves latency when reading the same data repeatedly from multiple instances by avoiding redundant requests to your mounted S3 bucket. To enable shared cache, specify the cache-xz flag in mountOptions with your directory bucket name:
apiVersion: v1
kind: PersistentVolume
metadata:
name: s3-pv
spec:
mountOptions:
- cache-xz amzn-s3-demo-bucket--usw2-az1--x-s3
csi:
driver: s3.csi.aws.com
# ...See Mountpoint's documentation for more details about shared cache.
You can opt in to a local cache and shared cache together if you have unused space on your instance, but also want to share the cache across multiple instances. This avoids redundant read requests from the same instance to the shared cache in S3 directory bucket when the required data is cached in local storage, reducing request cost as well as improving performance. To opt in to local and shared cache together, you can specify both the Local Cache and Shared Cache in your PV:
apiVersion: v1
kind: PersistentVolume
metadata:
name: s3-pv
spec:
mountOptions:
- cache-xz amzn-s3-demo-bucket--usw2-az1--x-s3
csi:
driver: s3.csi.aws.com
# ...
volumeAttributes:
bucketName: amzn-s3-demo-bucket
cache: emptyDir
cacheEmptyDirSizeLimit: 2GiSee Mountpoint's documentation for more details about combined local and shared cache.