-
Notifications
You must be signed in to change notification settings - Fork 58
Dynamic EBS Volume Sizing #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
chrismld
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hello Herbert, thanks for your contribution!! I left a few comments. For me, the blueprint didn't work, would you mind checking again? let me know if you'd like to troubleshoot with my setup :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks again Herb, the blueprint works for me now ... still, I'm requesting a few more changes to make a few things even easier to understand. Also, there's one important change that we need in order to test the whole blueprint easily. Also, please don't solve the conversation as it helps me to track what my feedback was :P
| apiVersion: karpenter.k8s.aws/v1 | ||
| kind: EC2NodeClass | ||
| metadata: | ||
| name: dynamic-disk-volume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use different names both for NodePool and EC2NodeClass ... every time we make a big change to the repo we need to test each blueprint, using different names will allow us to run all the commands and test all the commands in the blueprint without having to cleanup or decide which side to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both EC2NodeClass still have the same name
| apiVersion: karpenter.k8s.aws/v1 | ||
| kind: EC2NodeClass | ||
| metadata: | ||
| name: dynamic-disk-volume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as al2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both EC2NodeClass still have the same name
| ## Purpose | ||
|
|
||
| This blueprint shows how to automatically resize EBS volumes based on the EC2 instance type that Karpenter provisions. EBS volume size requirements differ among different instance types and this pattern ensures that each node gets an appropriately sized root volume without manual intervention. Some use cases this pattern supports: | ||
| * Larger instances host more pods, therefore, it is necessary to have larger EBS volumes to store the corresponding container images. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not quite convinced the problem we're solving here is clear here. What I'd like to see is something like: "Let's say you have 10 pods that have large needs for storage and are pulling at least three different container images, as with Karpenter you'll most likely end up having multiple node sizes, you might be in trouble if the EBS volume doesn't have enough storage. Therefore, this blueprint is to help you resize an EBS volume based on the node size. For instance, this blueprint is assuming that if Karpenter launches a 4xlarge instance, the EBS volume size should be 500Gb, if it's a 6xlarge it should be 600Gb, etc."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You actually have this almost at the end:
"For example, if Karpenter provisioned a c6i.2xlarge instance, you should see that the /dev/xvda device has been automatically resized to 300GB (as per the sizing logic for 2xlarge instances), even though the initial EBS volume was created with only 20GB."
I like this, but having it at the end it's already too late don't you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several issues around this:
aws/karpenter-provider-aws#2394
kubernetes-sigs/karpenter#1988
- 180 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, I'm aware, but my point here is more about the story line and not about the issues ... having this clarification before you start deploying will help users to understand what they're about to do
| volumeType: gp3 | ||
| deleteOnTermination: true | ||
| encrypted: true | ||
| userData: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the same placeholder (user-data = "<<BASE64_USER_DATA>>") as the Bottlerocket one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the same script is going to work, but I think it helps to have consistency here and use the same pattern in regards of using a script and then injecting it. Or keep the whole script within the EC2NodeClass definition. Whichever approach you think works better long-term it should be used in both. I personally think that having a separate script forces you to do a replacement every time you need to deploy a change, so I prefer the al2023 approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consolidated everything in one script but it has to be pass by differently to each OS.
The steps are the same in the guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you help me understand this bit: "but it has to be pass by differently to each OS"?
|
|
||
| </details> | ||
|
|
||
| Once you have deployed the `EC2NodeClass` and `Nodepool` for Amazon Linux 2023 or Bottlecket, proceed to deploy the test workload: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you talk about the needs of this workload? that will help to confirm why it made sense to use 300Gb for a 2xlarge instance, say for example: "each pod needs 30Gb, and only 10 can fit into a 2xlarge" or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a test workload that it is intended to show that the volume was actually resized.
Is it really relevant that the test complies with the use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is, it helps to understand better what we're doing and showcasing here
| apiVersion: apps/v1 | ||
| kind: Deployment | ||
| metadata: | ||
| name: dynamic-disk-ebs-volume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use a different workload per OS as per my previous comment to keep things separate and it's easier to test the whole blueprint :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can duplicate the instructions in each section but it makes no sense to duplicate the deployment manifest since it is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the main concern is that at the moment we don't have an automated way to test all blueprints every time we need to make a major update, we might skip one or the other when doing a test ... perhaps we can just add a command to replace the nodeSelectors for each OS (or at least for one)?
chrismld
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the new changes Herb, there are a few things that still need to be resolved and I added a comment
| userData: | | ||
| [settings.bootstrap-containers.ebsresize] | ||
| mode = "once" | ||
| user-data = "IyEvYmluL2Jhc2gKc2V0IC1lCgpnZXRfaW1kc190b2tlbigpIHsKICAgIGN1cmwgLVggUFVUICJodHRwOi8vMTY5LjI1NC4xNjkuMjU0L2xhdGVzdC9hcGkvdG9rZW4iIFwKICAgICAgICAtSCAiWC1hd3MtZWMyLW1ldGFkYXRhLXRva2VuLXR0bC1zZWNvbmRzOiAyMTYwMCIgXAogICAgICAgIC0tbWF4LXRpbWUgMTAgLS1yZXRyeSAzCn0KCmdldF9tZXRhZGF0YSgpIHsKICAgIGxvY2FsIHBhdGg9JDEKICAgIGxvY2FsIHRva2VuPSQyCiAgICBjdXJsIC1IICJYLWF3cy1lYzItbWV0YWRhdGEtdG9rZW46ICR0b2tlbiIgXAogICAgICAgICJodHRwOi8vMTY5LjI1NC4xNjkuMjU0L2xhdGVzdC9tZXRhLWRhdGEvJHBhdGgiIFwKICAgICAgICAtLW1heC10aW1lIDEwIC0tcmV0cnkgMwp9CgpkZXRlY3Rfb3MoKSB7CiAgICBpZiBbIC1mIC9ldGMvYm90dGxlcm9ja2V0LXJlbGVhc2UgXTsgdGhlbgogICAgICAgIGVjaG8gImJvdHRsZXJvY2tldCIKICAgIGVsaWYgWyAtZiAvZXRjL29zLXJlbGVhc2UgXSAmJiBncmVwIC1xICJBbWF6b24gTGludXgiIC9ldGMvb3MtcmVsZWFzZTsgdGhlbgogICAgICAgIGVjaG8gImFsMjAyMyIKICAgIGZpCn0KCmdldF90YXJnZXRfc2l6ZV9ieV9zdWZmaXgoKSB7CiAgICBsb2NhbCBpbnN0YW5jZV90eXBlPSQxCiAgICBsb2NhbCBzaXplX3N1ZmZpeD0kKGVjaG8gJGluc3RhbmNlX3R5cGUgfCBzZWQgJ3MvLipcLi8vJykKCiAgICBjYXNlICRzaXplX3N1ZmZpeCBpbgogICAgICAgIG5hbm8pIGVjaG8gMjAgOzsKICAgICAgICBtaWNybykgZWNobyAzMCA7OwogICAgICAgIHNtYWxsKSBlY2hvIDQwIDs7CiAgICAgICAgbWVkaXVtKSBlY2hvIDYwIDs7CiAgICAgICAgbGFyZ2UpIGVjaG8gMTAwIDs7CiAgICAgICAgeGxhcmdlKSBlY2hvIDIwMCA7OwogICAgICAgIDJ4bGFyZ2UpIGVjaG8gMzAwIDs7CiAgICAgICAgM3hsYXJnZSkgZWNobyA0MDAgOzsKICAgICAgICA0eGxhcmdlKSBlY2hvIDUwMCA7OwogICAgICAgIDZ4bGFyZ2UpIGVjaG8gNjAwIDs7CiAgICAgICAgOHhsYXJnZXw5eGxhcmdlKSBlY2hvIDgwMCA7OwogICAgICAgIDEyeGxhcmdlKSBlY2hvIDEwMDAgOzsKICAgICAgICAxNnhsYXJnZXwxOHhsYXJnZSkgZWNobyAxMjAwIDs7CiAgICAgICAgMjR4bGFyZ2UpIGVjaG8gMTUwMCA7OwogICAgICAgIDMyeGxhcmdlfDQ4eGxhcmdlfDU2eGxhcmdlfDExMnhsYXJnZSkgZWNobyAyMDAwIDs7CiAgICAgICAgbWV0YWwpIGVjaG8gMTAwMCA7OwogICAgICAgICopIGVjaG8gMTAwIDs7CiAgICBlc2FjCn0KCnJlc2l6ZV9lYnNfZm9yX2luc3RhbmNlKCkgewogICAgbG9jYWwgb3NfdHlwZT0kKGRldGVjdF9vcykKICAgIGxvY2FsIGRldmljZQogICAgCiAgICBjYXNlICRvc190eXBlIGluCiAgICAgICAgYm90dGxlcm9ja2V0KQogICAgICAgICAgICBkZXZpY2U9Ii9kZXYveHZkYiIKICAgICAgICAgICAgOzsKICAgICAgICBhbDIwMjMpCiAgICAgICAgICAgIGRldmljZT0iL2Rldi94dmRhIgogICAgICAgICAgICA7OwogICAgZXNhYwoKICAgIGxvY2FsIHRva2VuPSQoZ2V0X2ltZHNfdG9rZW4pCiAgICBpZiBbIC16ICIkdG9rZW4iIF07IHRoZW4KICAgICAgICBlY2hvICJGYWlsZWQgdG8gZ2V0IElNRFN2MiB0b2tlbiIKICAgICAgICByZXR1cm4gMQogICAgZmkKCiAgICBsb2NhbCBpbnN0YW5jZV9pZD0kKGdldF9tZXRhZGF0YSAiaW5zdGFuY2UtaWQiICIkdG9rZW4iKQogICAgbG9jYWwgcmVnaW9uPSQoZ2V0X21ldGFkYXRhICJwbGFjZW1lbnQvcmVnaW9uIiAiJHRva2VuIikKICAgIGxvY2FsIGluc3RhbmNlX3R5cGU9JChnZXRfbWV0YWRhdGEgImluc3RhbmNlLXR5cGUiICIkdG9rZW4iKQoKICAgIGlmIFsgLXogIiRpbnN0YW5jZV9pZCIgXSB8fCBbIC16ICIkcmVnaW9uIiBdIHx8IFsgLXogIiRpbnN0YW5jZV90eXBlIiBdOyB0aGVuCiAgICAgICAgZWNobyAiRmFpbGVkIHRvIGdldCByZXF1aXJlZCBtZXRhZGF0YSIKICAgICAgICByZXR1cm4gMQogICAgZmkKCiAgICBsb2NhbCB0YXJnZXRfc2l6ZT0kKGdldF90YXJnZXRfc2l6ZV9ieV9zdWZmaXggIiRpbnN0YW5jZV90eXBlIikKICAgIGVjaG8gIk9TOiAkb3NfdHlwZSwgSW5zdGFuY2U6ICRpbnN0YW5jZV90eXBlIC0+IFRhcmdldDogJHt0YXJnZXRfc2l6ZX1HQiIKCiAgICBsb2NhbCB2b2x1bWVfaWQ9JChhd3MgZWMyIGRlc2NyaWJlLWluc3RhbmNlcyAtLWluc3RhbmNlLWlkcyAkaW5zdGFuY2VfaWQgXAogICAgICAgIC0tcXVlcnkgJ1Jlc2VydmF0aW9uc1swXS5JbnN0YW5jZXNbMF0uQmxvY2tEZXZpY2VNYXBwaW5nc1s/RGV2aWNlTmFtZT09YCckZGV2aWNlJ2BdLkVicy5Wb2x1bWVJZCcgXAogICAgICAgIC0tb3V0cHV0IHRleHQgLS1yZWdpb24gJHJlZ2lvbikKCiAgICBpZiBbIC16ICIkdm9sdW1lX2lkIiBdIHx8IFsgIiR2b2x1bWVfaWQiID0gIk5vbmUiIF07IHRoZW4KICAgICAgICBlY2hvICJFcnJvcjogQ291bGQgbm90IGZpbmQgdm9sdW1lIGZvciBkZXZpY2UgJGRldmljZSIKICAgICAgICByZXR1cm4gMQogICAgZmkKCiAgICBsb2NhbCBjdXJyZW50X3NpemU9JChhd3MgZWMyIGRlc2NyaWJlLXZvbHVtZXMgLS12b2x1bWUtaWRzICR2b2x1bWVfaWQgXAogICAgICAgIC0tcXVlcnkgJ1ZvbHVtZXNbMF0uU2l6ZScgLS1vdXRwdXQgdGV4dCAtLXJlZ2lvbiAkcmVnaW9uKQoKICAgIGlmIFsgIiRjdXJyZW50X3NpemUiIC1nZSAiJHRhcmdldF9zaXplIiBdOyB0aGVuCiAgICAgICAgZWNobyAiVm9sdW1lIGFscmVhZHkgY29ycmVjdCBzaXplICgke2N1cnJlbnRfc2l6ZX1HQikiCiAgICAgICAgcmV0dXJuIDAKICAgIGZpCgogICAgZWNobyAiUmVzaXppbmcgdm9sdW1lIGZyb20gJHtjdXJyZW50X3NpemV9R0IgdG8gJHt0YXJnZXRfc2l6ZX1HQiIKICAgIGF3cyBlYzIgbW9kaWZ5LXZvbHVtZSAtLXZvbHVtZS1pZCAkdm9sdW1lX2lkIC0tc2l6ZSAkdGFyZ2V0X3NpemUgLS1yZWdpb24gJHJlZ2lvbgoKICAgICMgV2FpdCBmb3IgY29tcGxldGlvbgogICAgbG9jYWwgdGltZW91dD0zMDAKICAgIGxvY2FsIGVsYXBzZWQ9MAogICAgbG9jYWwgd2FpdF90aW1lPTIKCiAgICB3aGlsZSBbICRlbGFwc2VkIC1sdCAkdGltZW91dCBdOyBkbwogICAgICAgIGxvY2FsIHN0YXRlPSQoYXdzIGVjMiBkZXNjcmliZS12b2x1bWVzLW1vZGlmaWNhdGlvbnMgLS12b2x1bWUtaWRzICR2b2x1bWVfaWQgXAogICAgICAgICAgICAtLXF1ZXJ5ICdWb2x1bWVzTW9kaWZpY2F0aW9uc1swXS5Nb2RpZmljYXRpb25TdGF0ZScgLS1vdXRwdXQgdGV4dCAtLXJlZ2lvbiAkcmVnaW9uKQoKICAgICAgICBpZiBbWyAiJHN0YXRlIiA9ICJjb21wbGV0ZWQiIHx8ICIkc3RhdGUiID0gIm9wdGltaXppbmciIF1dOyB0aGVuCiAgICAgICAgICAgIGVjaG8gIlZvbHVtZSBtb2RpZmljYXRpb24gY29tcGxldGVkIgogICAgICAgICAgICBicmVhawogICAgICAgIGVsaWYgWyAiJHN0YXRlIiA9ICJmYWlsZWQiIF07IHRoZW4KICAgICAgICAgICAgZWNobyAiVm9sdW1lIG1vZGlmaWNhdGlvbiBmYWlsZWQiCiAgICAgICAgICAgIHJldHVybiAxCiAgICAgICAgZmkKCiAgICAgICAgc2xlZXAgJHdhaXRfdGltZQogICAgICAgIGVsYXBzZWQ9JCgoZWxhcHNlZCArIHdhaXRfdGltZSkpCiAgICAgICAgd2FpdF90aW1lPSQoKHdhaXRfdGltZSAqIDIpKQogICAgICAgIFsgJHdhaXRfdGltZSAtZ3QgMzAgXSAmJiB3YWl0X3RpbWU9MzAKICAgIGRvbmUKCiAgICBpZiBbICRlbGFwc2VkIC1nZSAkdGltZW91dCBdOyB0aGVuCiAgICAgICAgZWNobyAiVGltZW91dCB3YWl0aW5nIGZvciB2b2x1bWUgbW9kaWZpY2F0aW9uIgogICAgICAgIHJldHVybiAxCiAgICBmaQoKICAgICMgT1Mtc3BlY2lmaWMgZmlsZXN5c3RlbSByZXNpemUKICAgIGlmIFsgIiRvc190eXBlIiA9ICJhbDIwMjMiIF07IHRoZW4KICAgICAgICBncm93cGFydCAkZGV2aWNlIDEgfHwgewogICAgICAgICAgICBlY2hvICJGYWlsZWQgdG8gZXh0ZW5kIHBhcnRpdGlvbiIKICAgICAgICAgICAgcmV0dXJuIDEKICAgICAgICB9CgogICAgICAgIGxvY2FsIGZzX3R5cGU9JChsc2JsayAtZiAke2RldmljZX0xIHwgdGFpbCAtMSB8IGF3ayAne3ByaW50ICQyfScpCiAgICAgICAgY2FzZSAkZnNfdHlwZSBpbgogICAgICAgICAgICB4ZnMpCiAgICAgICAgICAgICAgICB4ZnNfZ3Jvd2ZzIC8gfHwgewogICAgICAgICAgICAgICAgICAgIGVjaG8gIkZhaWxlZCB0byByZXNpemUgWEZTIGZpbGVzeXN0ZW0iCiAgICAgICAgICAgICAgICAgICAgcmV0dXJuIDEKICAgICAgICAgICAgICAgIH0KICAgICAgICAgICAgICAgIDs7CiAgICAgICAgICAgIGV4dDQpCiAgICAgICAgICAgICAgICByZXNpemUyZnMgJHtkZXZpY2V9MSB8fCB7CiAgICAgICAgICAgICAgICAgICAgZWNobyAiRmFpbGVkIHRvIHJlc2l6ZSBleHQ0IGZpbGVzeXN0ZW0iCiAgICAgICAgICAgICAgICAgICAgcmV0dXJuIDEKICAgICAgICAgICAgICAgIH0KICAgICAgICAgICAgICAgIDs7CiAgICAgICAgICAgICopCiAgICAgICAgICAgICAgICBlY2hvICJVbnN1cHBvcnRlZCBmaWxlc3lzdGVtOiAkZnNfdHlwZSIKICAgICAgICAgICAgICAgIHJldHVybiAxCiAgICAgICAgICAgICAgICA7OwogICAgICAgIGVzYWMKICAgIGZpCgogICAgZWNobyAiRUJTIHJlc2l6ZSBjb21wbGV0ZWQgc3VjY2Vzc2Z1bGx5IgogICAgcmV0dXJuIDAKfQoKIyBFeGVjdXRlIHJlc2l6ZQppZiAhIHJlc2l6ZV9lYnNfZm9yX2luc3RhbmNlOyB0aGVuCiAgICBlY2hvICJFQlMgcmVzaXplIGZhaWxlZCIKZmkKCiMgT1Mtc3BlY2lmaWMgYm9vdHN0cmFwCm9zX3R5cGU9JChkZXRlY3Rfb3MpCmlmIFsgIiRvc190eXBlIiA9ICJhbDIwMjMiIF07IHRoZW4KICAgIC91c3IvYmluL25vZGVhZG0gaW5pdApmaQo=" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please keep the same approach as AL2023 without encoding to base64
|
@herbertgoto hello, are you still planning to update this PR? |
|
@chrismld ETA: 31/01/2026 |
Issue #:
kubernetes-sigs/karpenter#1988
Description of changes:
This blueprint shows how to automatically resize EBS volumes based on the EC2 instance type that Karpenter provisions. EBS volume size requirements differ among different instance types and this pattern ensures that each node gets an appropriately sized root volume without manual intervention.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choic