Description
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Tell us about your request
What do you want us to build?
Upgrade systemd version in AWS optimized EKS AMI to version greater than v239
Which service(s) is this request for?
This is for EKS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.
We were looking into configuring graceful shutdown for kubernetes nodes. The feature is enabled by default from kubernetes version 1.21.(https://kubernetes.io/blog/2021/04/21/graceful-node-shutdown-beta/ ). Our clusters are running v1.24.
After enabling it and configuring the ShutdownGracePeriod and ShutdownGracePeriodCriticalPods - using the kubelet configuration options, we see that the graceful shutdown is not working as expected. When karpenter(we use karpenter for cluster scaling) detects a node is empty, it terminates the node and the node is terminated immediately without any grace period.
We checked for the issue and we found out few references which indicate there is an issue with the systemd version on the node. We use the AWS EKS optimized linux AMI for the nodes and we see that the systemd version is v219.
As per the links below, it seems this is fixed after v239 of systemd.
- GracefulNodeShutdown not work kubernetes/kubernetes#107043 (comment)
- feat: add support for shutdownGracePeriod and shutdownGracePeriodCriticalPods kubernetes-sigs/karpenter#248 (comment)
Are you currently working around this issue?
How are you currently solving this problem?
No known workaround is known at this point.
Additional context
Anything else we should know?
Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)