Skip to content

Add minimal EKS Auto Mode test tooling#4285

Open
pcnudde wants to merge 1 commit intoNVIDIA:mainfrom
pcnudde:feature/aws-k8s
Open

Add minimal EKS Auto Mode test tooling#4285
pcnudde wants to merge 1 commit intoNVIDIA:mainfrom
pcnudde:feature/aws-k8s

Conversation

@pcnudde
Copy link
Collaborator

@pcnudde pcnudde commented Mar 10, 2026

Summary

  • add a minimal EKS Auto Mode cluster config under tests/tools/aws/eks
  • add a tiny non-FLARE workload manifest to verify pod scheduling on the cluster
  • document the direct eksctl and kubectl workflow in the README

Testing

  • not run (documentation and manifest additions only)

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 10, 2026

Greptile Summary

This PR adds minimal EKS Auto Mode test tooling under tests/tools/aws/eks/, consisting of an eksctl cluster config (cluster.yaml), a lightweight test workload manifest (inflate.yaml), and a README.md documenting the end-to-end workflow. The changes are documentation and manifests only — no FLARE code is touched.

Key observations:

  • inflate.yaml is well-formed: uses the pause image from a pinned ECR tag, includes a sensible pod security context, and correctly targets EKS Auto Mode nodes via the eks.amazonaws.com/compute-type: auto node selector.
  • cluster.yaml does not pin a Kubernetes version, so the provisioned version will vary with the eksctl default in use at the time — worth pinning for reproducibility.
  • The README kubectl get events -w --sort-by '.lastTimestamp' command is slightly misleading: --sort-by applies only to the initial list; newly streamed events are unsorted.
  • No secrets, credentials, or cost-estimation guidance are included; users should be aware that an EKS control plane and Auto Mode nodes incur real AWS charges.

Confidence Score: 5/5

  • Safe to merge — documentation and manifest additions only, no runtime code affected.
  • All three files are new additions (no existing code modified). The only findings are minor style suggestions (unpinned K8s version, misleading kubectl flag combination) that do not affect correctness or security.
  • No files require special attention; minor suggestions on cluster.yaml and README.md are low priority.

Important Files Changed

Filename Overview
tests/tools/aws/eks/README.md Clear and well-structured quickstart guide; one minor issue with --sort-by being silently ignored when combined with --watch in the events command.
tests/tools/aws/eks/cluster.yaml Minimal valid eksctl ClusterConfig for EKS Auto Mode; no Kubernetes version is pinned, which makes the provisioned version depend on the eksctl default at the time of use.
tests/tools/aws/eks/inflate.yaml Well-formed test Deployment using the lightweight pause image; includes a reasonable security context, correct Auto Mode node selector, and a pinned image tag.

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant eksctl
    participant EKS as EKS Control Plane
    participant Karpenter as Auto Mode (Karpenter)
    participant Node as EC2 Node

    Dev->>eksctl: eksctl create cluster -f cluster.yaml
    eksctl->>EKS: Provision EKS Auto Mode cluster
    EKS-->>Dev: Cluster ready (0 nodes)

    Dev->>EKS: kubectl apply -f inflate.yaml
    EKS->>Karpenter: Unschedulable pod detected
    Karpenter->>Node: Provision EC2 node
    Node-->>EKS: Node joins cluster
    EKS-->>Dev: Pod scheduled & running

    Dev->>EKS: kubectl delete -f inflate.yaml
    EKS->>Karpenter: No pending workloads
    Karpenter->>Node: Deprovision node (scale to 0)

    Dev->>eksctl: eksctl delete cluster -f cluster.yaml --wait
    eksctl->>EKS: Delete cluster & resources
    EKS-->>Dev: Cluster deleted
Loading

Last reviewed commit: 97cfbbf


```bash
kubectl apply -f inflate.yaml
kubectl get events -w --sort-by '.lastTimestamp'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--sort-by is ignored when combined with --watch

--sort-by only applies to the initial list request; once --watch mode streams new events they arrive in arrival order regardless of the flag. This means the command gives the false impression that output will remain sorted, which can be confusing when debugging node provisioning.

Consider splitting into two separate commands or dropping the --sort-by flag when watching:

Suggested change
kubectl get events -w --sort-by '.lastTimestamp'
kubectl get events -w

Comment on lines +1 to +10
# Edit `name` and `region` if you want different values.
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
name: nvflare-auto-test
region: us-west-2

autoModeConfig:
enabled: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Kubernetes version pinned

Without a version field, eksctl will silently pick its own default Kubernetes version, which can change across eksctl releases. This makes it hard to reproduce a specific environment, and a future eksctl upgrade could provision a different version than what was tested.

Consider pinning an explicit version, e.g.:

metadata:
  name: nvflare-auto-test
  region: us-west-2
  version: "1.31"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant