Skip to content

docs: add design poposal for integrating Kubernetes into EMT #273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

hyunsun
Copy link
Contributor

@hyunsun hyunsun commented May 5, 2025

Description

Add design proposal for integrating Kubernetes into Edge Microvisor Toolkit.

Any Newly Introduced Dependencies

Please describe any newly introduced 3rd party dependencies in this change. List their name, license information and how they are used in the project.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Checklist:

  • I agree to use the APACHE-2.0 license for my code changes
  • I have not introduced any 3rd party dependency changes
  • I have performed a self-review of my code


Manual cluster creation is an existing method where users create a cluster by selecting a template and hosts. Before we talk about manual cluster creation workflow, it is important to understand the coupling between EMT image version and cluster template introduced by integrating Kubernetes into EMT. For instance, creating a K3s cluster with a K3s v1.32 template may not work as expected on a EMT machine provisioned with K3s v1.30. Such discrepancies can occur when Edge Orchestration undergoes multiple upgrades, resulting in a mix of multiple versions of cluster templates and EMT machines. While this issue is relevant for K3s on EMT only, given that this combination is the primary use case, it is important to address it properly in the user workflow. There are a few options to mitigate this issue.

**Option 1)** Restrict cluster creation to onboarded hosts only, excluding provisioned hosts. The Cluster Manager automatically selects the appropriate EMT version, and requests instance creation to the Infra Manager to install the OS and Kubernetes. Could be simplest option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doable by looking at the content of the manifest.json associated with an OSProfile. It tells basically all the package versions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is the part that I wanted confirmation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who would be doing that? UI? Cluster Orch? Some component must parse the manifest and infer the installed k3s version. Please clarify in the proposal

**1. Automatically when a host is deauthorized**: Users can opt to delete the cluster when the host is deauthorized, with this option enabled by default during cluster creation. The Cluster Manager will automatically delete the cluster when all hosts within it are deauthorized.
**2. Manually through a direct request to the Cluster Manager**: Users can initiate a cluster deletion by making a direct API call to the Cluster Manager.

For manual deletion, the handling of cluster nodes depends on the approach taken for manual creation. If manual creation is restricted to onboarded hosts only (Option 1), deleting a host should deauthorize the cluster nodes, making users to re-onboard the hosts for reuse. If either Option 2 or Option 3 is chosen, deleting a host should only clean up Kubernetes, which aligns with the current behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont fully understand this paragraph. deleting a host should deauthorize the cluster nodes, making users to re-onboard the hosts for reuse. -> how does this relate with option1 or option3?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is counter-intuitive: I would expect that deleting a cluster-node, automatically you will delete Instance and Host in EIM to allow them to be onboarded again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See updates, I tried to explain it better. In short, it's for consistency across workflows.

@ajaythakurintel ajaythakurintel added the Proposal Identify a PR as a design proposal to be reviewed. label May 9, 2025

## Abstract

The Edge Microvisor Toolkit (EMT) is an operating system specifically designed for hosting edge workloads, streamlining traditional general-purpose operating systems by including only the essential components needed to run container-based applications. Our experience from previous releases has demonstrated that EMT's design principles—image-based deployment and immutable root filesystem—enhance the reliability and consistency of cluster creation compared to even well-maintained general-purpose operating systems, such as Ubuntu.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw on the long term will be these the only image we plan to support for EMT/EMT-S? cc @krishnajs

@hyunsun hyunsun requested a review from eoghanlawless May 10, 2025 17:03

The first option introduces a streamlined process for users seeking simplicity or bulk registration of hosts, or both. Both the Web UI and CLI will offer a new toggle for `Create Cluster Automatically` (potentially combined with `Provision Automatically`, pending final design decisions) during registration. This option is only valid when `Provision Automatically` is enabled. When enabled, it automatically creates a single-node cluster for each host.

Users can provide a default cluster template for all hosts in the registration request, with the flexibility to override it for specific hosts if needed. For EMT machines, once a specific OS profile is selected, only cluster templates compatible with that EMT version will be displayed. This includes all RKE2 templates and K3s templates that match the K3s version embedded in the selected EMT image.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ho do you override the template to a specific one ? can we have a drop down in the registration instead of a toggle ? where the dropdown allows you to pick the template between any k8s type ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is more about UX design. Patricia has initial design so feel free to reach out her, but "toggle" is for user to select auto cluster creation enabled or disabled. User should provide more information when enabled in subsequent steps of registration.

- **EMT Image Version**: Specifies the operating system and embedded software, including the K3s version, on the edge device.
- **Cluster Template**: Defines the configuration for Kubernetes cluster creation, including the Kubernetes flavor, version, and custom control plane settings.

For instance, attempting to create a K3s cluster using a template for K3s v1.32 on an EMT machine embedding K3s v1.30 may result in compatibility issues. Such mismatches can occur when Edge Orchestration undergoes multiple upgrades, leading to a mix of cluster templates and EMT machines with varying versions. While this issue is specific to the K3s-on-EMT scenario, it is critical to address it effectively in the user workflow, as this represents the most common use case for EMF.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to include A/B upgrades in the discussion or create a new ADR. In this logic CO is not involved - what is the impact on our ms. cc @daniele-moro that is working on day2 improvements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added an "Upgrade Cluster" section. There is a small task for the Infra Manager to address, which involves preventing unintended K3s version update through EMT update. Note that for the 3.1 release, we will not support multiple K3s versions nor K8s version upgrade in CO - we're still waiting CAPI to implement the foundation for in-place upgrades. So upgrade can wait for a future release.

@hyunsun hyunsun self-assigned this May 14, 2025
@hyunsun hyunsun marked this pull request as ready for review May 14, 2025 15:42

#### Automatic Cluster Creation

The first option introduces a streamlined process for users seeking simplicity or bulk registration of hosts, or both. Both the Web UI and CLI will offer a new toggle for `Create Cluster Automatically` (potentially combined with `Provision Automatically`, pending final design decisions) during registration. This option is only valid when `Provision Automatically` is enabled. When enabled, it automatically creates a single-node cluster for each host.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some deployments (like minimal OXM deployment) we won't deploy cluster orchestration. In that case we need to configure UI accordingly to not show the new toggle for CO. CC: @teone

@hyunsun hyunsun changed the title [WIP] docs: add design poposal for integrating Kubernetes into EMT docs: add design poposal for integrating Kubernetes into EMT May 15, 2025
Copy link

@krishnajs krishnajs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hyunsun @gcgirish and @adorney99 we had a long discussion in EIM team and here is the summary of the feedback specifically showing the difference between OXM use case and End customer use case with EMF. We need to reconcile on this

image

@gcgirish
Copy link
Contributor

gcgirish commented May 15, 2025

@hyunsun @gcgirish and @adorney99 we had a long discussion in EIM team and here is the summary of the feedback specifically showing the difference between OXM use case and End customer use case with EMF. We need to reconcile on this

image

@krishnajs : Thanks. Two questions.

  1. What is OXM?
  2. I believe MT is abbreviation for multi-tenancy. How does MT come into picture with EMT-S? I see it is ticked green in EMF(OXM profile).

@krishnajs
Copy link

@hyunsun @gcgirish and @adorney99 we had a long discussion in EIM team and here is the summary of the feedback specifically showing the difference between OXM use case and End customer use case with EMF. We need to reconcile on this
image

@krishnajs : Thanks. Two questions.

1. What is OXM?

Original equipment manufacturer like Dell, Lenovo, Advantech.

2. I believe `MT` is abbreviation for multi-tenancy. How does MT come into picture with EMT-S? I see it is ticked green in EMF(OXM profile).

Yes, its multi-tenancy GW. We don't think OXM would need this (not sure yet). But for now what we are saying is we will not disable or change MT/MT Gateway in 3.1

Copy link
Contributor

@Paulina-Osikoya Paulina-Osikoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hyunsun i have read and reviewed the doc of the proposed implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposal Identify a PR as a design proposal to be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants