Skip to content

Launch Template support required for custom K8s node images #2200

@hanizang77

Description

@hanizang77

Launch Template Support Required for Custom K8s Node Images

Investigation Date: 2025-11-05
Discovery Context: Found during #2199 K8s image classification accuracy testing
Related Issue: #1647 Support GPU node group to Kubernetes cluster (blocker)


📋 Table of Contents

  1. Summary
  2. Problem Discovery
  3. Root Cause Analysis
  4. Proposed Solution
  5. Relationship with Related Issues
  6. References

📌 1. Summary

1.1 Problem

When creating an EKS NodeGroup with explicit Debian AMI ID (ami-0e9ef8d785f2364dc), the actual node was created with AL2023_x86_64_STANDARD AMI instead. Custom AMI ID parameter is validated but ignored.

1.2 Root Causes

  1. Incomplete Code: imageId parameter validated but not assigned to request struct (provisioning.go:3724-3736)
  2. Missing Implementation: CB-Tumblebug doesn't implement Launch Template support
  3. AWS EKS Requirement: Custom AMI requires Launch Template for EKS Managed Node Groups

1.3 Impact

  • GPU Node Group Feature: Cannot be implemented (Support GPU node group to Kubernetes cluster #1647)
  • Custom Image Support: NHN Cloud and other CSPs requiring custom images cannot be supported
  • Special Purpose AMIs: Cannot use GPU, ARM-optimized, or other specialized AMIs on AWS

1.4 Key Findings

  • CB-Tumblebug currently uses AMI Type approach (simple, no Launch Template needed)
  • Custom AMI ID requires Launch Template + bootstrap script on AWS EKS
  • Launch Template support is missing in current implementation

2. Problem Discovery

When creating an EKS NodeGroup with explicit Debian AMI ID (ami-0e9ef8d785f2364dc), the actual created node used AL2023_x86_64_STANDARD AMI instead.

Test Request

curl -X 'POST' \
  'http://localhost:1323/tumblebug/ns/default/k8sCluster/k8scluster01/k8sNodeGroupDynamic' \
  -H 'Authorization: Basic ZGVmYXVsdDpkZWZhdWx0' \
  -d '{
    "imageId": "ami-0e9ef8d785f2364dc",
    "specId": "aws+ap-northeast-2+t3a.xlarge",
    "name": "k8sng01"
  }'

Actual Result

{
  "AmiType": "AL2023_x86_64_STANDARD",
  "Status": "ACTIVE"
}

3. Root Cause Analysis

3.1 CB-Tumblebug Code Status

File: src/core/infra/provisioning.go
Function: getK8sNodeGroupReqFromDynamicReq (line 3707-3780)

// Line 3724-3736: Validates imageId but doesn't assign it
if strings.EqualFold(dReq.ImageId, "default") || strings.EqualFold(dReq.ImageId, "") {
    // do nothing
} else {
    // check if the image is available in the CSP
    _, err = resource.LookupImage(k8sClusterInfo.ConnectionName, dReq.ImageId)
    if err != nil {
        log.Error().Err(err).Msg("Failed to get the Image from the CSP")
        return emptyK8sNgReq, err
    }
}
// ❗ MISSING: k8sNgReq.ImageId = dReq.ImageId

Issue: dReq.ImageId is validated via LookupImage() but never assigned to k8sNgReq.ImageId.

3.2 Missing Launch Template Implementation

# Launch Template code search result
grep -r "LaunchTemplate\|launchTemplate" src/core/infra/provisioning.go
# → No matches found

Current Status: CB-Tumblebug does not use Launch Template.

Critical Issue: Even if the code bug in 3.1 is fixed (assigning imageId), it would still be unusable without Launch Template support. AWS EKS simply ignores the imageId field when no Launch Template is provided.

Data Flow:

User Request (imageId: ami-xxx)
  ↓
CB-Tumblebug (Validates only, doesn't assign)
  ↓
Spider (Passed without imageId)
  ↓
AWS EKS API (No Launch Template → only recognizes amiType)
  ↓
Default AMI Type used (AL2023_x86_64_STANDARD)

3.3 AWS EKS Managed Node Group AMI Specification Methods

AWS EKS Managed Node Group supports two AMI specification methods:

Method 1: AMI Type (Default, current CB-Tumblebug implementation)

  • Specify only amiType parameter (e.g., AL2023_x86_64_STANDARD)
  • EKS automatically selects the latest EKS-optimized AMI
  • No Launch Template needed
  • Simplest and recommended approach

Method 2: Custom AMI ID (Advanced, not implemented)

  • Launch Template is required
  • Specify AMI ID + UserData (bootstrap script) in Launch Template
  • Reference Launch Template when creating NodeGroup
  • Must provide bootstrap script manually (/etc/eks/bootstrap.sh)

Important Constraints (AWS Official Documentation):

The following fields can't be set in the API if you specify an AMI ID:

  • amiType
  • releaseVersion
  • version

To use a custom AMI with an EKS managed node group, simply specify the AMI ID in your launch template. EKS will specify this AMI for the node group's auto scaling group.


4. Proposed Solution

4.1 Immediate Action

Since simply assigning imageId doesn't solve the problem without Launch Template support, the immediate action should be:

// Add validation in provisioning.go:3724-3736
if !strings.EqualFold(dReq.ImageId, "default") && !strings.EqualFold(dReq.ImageId, "") {
    return emptyK8sNgReq, fmt.Errorf("Custom AMI requires Launch Template support (not yet implemented)")
}

Rationale:

  • Prevents user confusion by explicitly indicating feature is not supported
  • Avoids silent parameter ignorance
  • Documents the limitation clearly

4.2 Long-term Solution

Create a separate issue for Launch Template feature implementation:

  • Research similar features in CSPs other than AWS (GCP, Azure, etc.)
  • Analyze impact of Spider interface changes
  • Design Launch Template abstraction for multi-cloud support
  • Prerequisite work for GPU node group support (Support GPU node group to Kubernetes cluster #1647)

5. Relationship with Related Issues

GPU Node Group Support Issue (#1647)

Requirement: K8s cluster support for GPU workloads

For AWS EKS:

  • GPU-enabled EKS-optimized AMIs provided (AL2 GPU, AL2023 GPU)
  • Custom AMI requires Launch Template
  • Requires pre-installed NVIDIA drivers/CUDA on AMI

Core Problem:

  • Without custom AMI specification capability, GPU node creation is impossible on some CSPs (see issue Support GPU node group to Kubernetes cluster #1647 comments)
  • NHN Cloud: Requires custom image creation via Image Builder
  • Other CSPs similarly require custom images or specific GPU image specification

Resolution Necessity: Must be resolved to support GPU features


6. References

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions