Skip to content

Design proposal on converting Standalone ENs to Managed ENs #244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

osinstom
Copy link
Contributor

@osinstom osinstom commented Apr 29, 2025

Description

This PR adds design proposal on converting Standalone ENs to Managed ENs.

Any Newly Introduced Dependencies

N/A

How Has This Been Tested?

N/A

Checklist:

  • I agree to use the APACHE-2.0 license for my code changes
  • I have not introduced any 3rd party dependency changes
  • I have performed a self-review of my code

**Step 0.** A user already provisioned a set of Standalone ENs following the user guides and decides to scale out.
They don't need to have direct access to the Edge Orchestrator UI/API (could be different personas on-site vs. remote administrator).
Users don't need to perform any configuration on the Edge Orchestrator beforehand, but we assume that the Edge Orchestrator supports
the OS version of the Standalone ENs. The remote administrator configures a special user/role for SEN onboarding.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an assumption that the "user/role for SEN" is associated with a single project that will determine in which tenant the SEN is imported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the same assumption we have now for IO onboarding

3. The CLI tool performs initial OS prerequisites checks - for instance, it can check if the UUID and SN are properly set on the OS.
4. A user is prompted for inputs. The user will be asked for the orchestrator FQDN, proxy settings (the already set proxy settings should be presented to the user),
and SEN onboarding user credentials. The CLI tool should validate that all required parameters are set and the orchestrator should be reachable at this point.
Moreover, the CLI tool can ask for additional input from the user such as which Local Accounts, Site or metadata to configure for a given EN (user must select a configuration that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: for any additional input that already exist in the Orchestrator I believe would be beneficial to present an interactive choice to the user, for example:

This is very import if the persona that uses the CLI tool does not have access to the orchestrator to retrieve the resourceIds that they needs to input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I've thought about that, but we would need to expose API to Inventory on the southbound interface. Or let CLI communicate with northbound API too, which could be feasible but would probably need a different role. Thoughts @daniele-moro @pierventre ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use the nb-apis. not sure why we would need a different role though

11. Onboarding Manager reads the OS info and queries OS profile from Inventory based on the OS version and OS distro.
OS version should uniqely identify the OS profile. Note that this is true for EMT OS profiles, but may not be true for mutable OSes.
See open issues for more considerations on the support of mutable OSes.
12. The Onboarding Manager creates a dedicated cloud-init configuration for SEN. The current cloud-init library can be used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the static IPs (feature planned for 3.1 and ) included in this cloud-init? Are they collected via the CLI or auto-discovered from the system?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a different cloud-init. This is a default cloud-init that EIM uses for EN provisioning. The other per-EN config can be provided via another cloud-init - @niket-intc is working on the design.

We can make CLI collect what day0 configuration should be applied (local accounts, per-EN config, sites, etc.) as user inputs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really want this? CLI is becoming like an orchestrator. Let's focus on the reqs only


### Assumptions

- Customers will drive the onboarding process from a local developer machine, with desktop, keyboard and mouse.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still true? Reading below it seems like the CLI is executed directly on the SEN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of yes - I still assume that users do SSH into ENs and there is no peripherals attached to ENs

The Instance desired state should be set to `RUNNING`, while the current state to `UNSPECIFIED` (until BM agents are up and running).
14. Once the onboarding is completed, the CLI tool receives the generated cloud-init configuration.
15. The CLI tool saves the cloud-init under the standard path that is used by EIM.
The user can either run the cloud-init manually or reboot the system to trigger the cloud-init.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the CLI prompt the user to reboot the system (or invoke cloud-init directly) to complete the process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely doable

already exists on the orchestrator and OM should be responsible for validating if that configuration exists in the Inventory).
Also, the CLI tool retrieves hardware info (UUID, Serial Number), OS info (OS version and distro from `/etc/os-release`), current the Secure Boot and Full-Disk Encryption settings,
MAC/IP address of the management interface.
5. The CLI tool downloads the CA certificate from the Edge Orchestrator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even to start with, we could move the certificate procurement (to provisioning machine) to an offline mechanism & put into the SEN rather than SEN making an outbound call. Implicit trust has always been questionable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but may require a bit more manual steps, which here the CLI automates that.

ctl->>+pa: Get orchestrator CA certificate
pa-->>-ctl: [CA certificate]

ctl->>+kc: Retrieve JWT token for standalone onboarding user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can we not use NIO registration instead of obtaining JWT using credentials here? The NIO flow does trust the deviceinit. Should reduce user interaction. I reckon security is not a concern as we allow NIO with both SB enabled/disabled.
OR
Since the provisioning machine is already well positioned to help user authenticate with the orch,
1.) can pull a bundle from the orch consisting of all artifacts needed + a one-time credential
2.) dump the bundle to the SEN to automate the rest of the flow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - this is something that was discussed long time ago with PDM too when they wanted to opt-in for a lightweight FDO

Copy link
Contributor

@krishnajs krishnajs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to BIT + pre-reg will be default for the EIM onboarding and provisioning, we should make Build-EMT-Sn-Import as default. Granularity is 1 EMT-S node... but in general multiple EMT-S Node import.


A Customer Journey for Open Edge Platform assumes that customers can manually deploy a set of Standalone
Edge Nodes (SEN) that can be onboarded to the Edge Orchestrator at later stage, once a customer is ready to scale their deployment.
SENs are converted to managed Edge Nodes which, once onboarded, are fully owned by the Edge Orchestrator - customers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is better to rename SEN to EMT-S node

A Customer Journey for Open Edge Platform assumes that customers can manually deploy a set of Standalone
Edge Nodes (SEN) that can be onboarded to the Edge Orchestrator at later stage, once a customer is ready to scale their deployment.
SENs are converted to managed Edge Nodes which, once onboarded, are fully owned by the Edge Orchestrator - customers
can manage them (e.g., install clusters, applications or perform Day2 OS updates) through the Edge Orchestrator UI and API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster might already be there on the EN. it is better to leave it at, manage them.

Copy link
Contributor

@pierventre pierventre Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we build these images - I expect that co is able to import those clusters. Is that correct?

The Customer Journey is as follows:

1. A customer installs one or more Standalone Edge Nodes following the user guides.
2. The customer uses the SEN to deploy K8s clusters, applications, etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the first install, EMT-S already installs EMT and Kubernetes.

1. A customer installs one or more Standalone Edge Nodes following the user guides.
2. The customer uses the SEN to deploy K8s clusters, applications, etc.
3. The customer decides to scale out their deployment and onboard the SENs to the Edge Orchestrator.
4. Once SENs are onboarded, the customer starts to use the Edge Orchestrator to manage the SENs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to create a new step - Customer installs the EMF onprem/on cloud to support the scale out deployment.

2. The customer uses the SEN to deploy K8s clusters, applications, etc.
3. The customer decides to scale out their deployment and onboard the SENs to the Edge Orchestrator.
4. Once SENs are onboarded, the customer starts to use the Edge Orchestrator to manage the SENs.
The customer can now use the Edge Orchestrator to manage the SENs, including installing clusters, applications, and performing Day2 OS updates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster might already be there in most cases. So it is either we can leave it a manage the node or say import cluster and perform LCM of devices


note over pa,inv: OS profiles created beforehand, no Host/Instance pre-registration

user->>user: Retrieves standalone onboarding user credentials
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a separate credential for EMT-S node ? Is it not same for managed EN ?


user->>user: Retrieves standalone onboarding user credentials
user->>ctl: Logs in to the EN and invokes CLI tool

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add the sequence diagram for the bulk onboard of EMT-S Nodes to the orch.


om->>inv: Query Inventory for OS resource by OS version and distribution

om->>om: Generate cloud-init for Standalone EN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to specific generate cloud init that is needed for the EN agents to work with orch. This ensured the cloud init will not overwrite/undo what user might have configured on the EN.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we should clrify what is needed for


om->>om: Validate JWT standalone onboarding role

om->>inv: Query Inventory for OS resource by OS version and distribution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the OS version does not match the OS resource ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should just return an error to the caller - so in generl we can always retry

2. An on-site user logs into the node and invokes the CLI tool that should already be installed on the EMT image.
The user enters into the interactive session with the CLI tool.
3. The CLI tool performs initial OS prerequisites checks - for instance, it can check if the UUID and SN are properly set on the OS.
4. A user is prompted for inputs. The user will be asked for the orchestrator FQDN, proxy settings (the already set proxy settings should be presented to the user),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking... we need to provide a UX where customer can download a payload from the Orch instance (like what we download certs today). This payload can have certs, config and other artifacts for EMT-S node to basically connect to Orch (even OTP can be part of it). When customer copies this payload to all ENs (e.g. bulk scp) this agent on the EN (e.g. node agent) can start to communicate with Orch and move to normal operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this more than IO - that mimics in somehow the FDO flow


om->>om: Generate cloud-init for Standalone EN

om->>inv: Create Host and Instance resource, Set EN as Onboarded and Provisioned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current_state. I still believe we should force for a top-down reg as pre-requ so we have always desired-current state evolving together

1. An on-site user retrieves user credentials for SEN onboarding from the remote administrator.
2. An on-site user logs into the node and invokes the CLI tool that should already be installed on the EMT image.
The user enters into the interactive session with the CLI tool.
3. The CLI tool performs initial OS prerequisites checks - for instance, it can check if the UUID and SN are properly set on the OS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we plan to unify around the single CLI tool? Or is this a separate CLI ?

Note that the SEN will only require a subset of current cloud-init that is generated for remote provisioning.
13. Once OS profile is found and matched, all required HW/OS info provided and cloud-init generated, the OM creates Host and Instance resources.
The statuses should be set to Onboarded and Provisioned. Host's desired and current state should be set to `ONBOARDED`.
The Instance desired state should be set to `RUNNING`, while the current state to `UNSPECIFIED` (until BM agents are up and running).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure about this ? I feel we should modify the hrm to touch only the modern status and avoid the modification of the current state. I feel the current state should be own only by the OM


### Considerations on the user workflow

There are two major workflows we support - bottom-up (IO) and top-down (NIO, requires EN pre-registration).
Copy link
Contributor

@pierventre pierventre Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not use IO/NIO. Let's substitute with just EMT-S onboarding/with/without pre-registration

In this ADR we selected to use the bottom-up approach as it requires less manual steps - user logs in to EN and
run the entire workflow from the EN itself, without the need to access the Edge Orchestrator UI/API.

Also, the IO flow doesn't require any modifications to UI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate here ?

### Considerations on scaling the SEN onboarding

The current design assumes that the user will SSH into a Standalone EN and trigger onboarding one by one (it can still be automated by a script).
Depending on the customers' requirements we can provide a kind of a "Bulk Onboarding Tool" that will allow to onboard multiple ENs at once.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe we can automate with ansible?

### How to map local OS users to Local Accounts?

For now, we won't import existing OS users as local accounts. If we support NIO in the future,
we can add support for defining local accounts that should be configured on the pre-provisioned SEN that is onboarded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can create the default-local account though if it is configured for the project

will not uniquely identify the OS image if that's a mutable OS (Ubuntu case) and Onboarding Manager is unable to query or validate mutable OS profile
based on info provided from SENs.
There are possible solutions to this:
- Users should be able to create their own OS Profiles that uses a custom OS image that was used for Standalone ENs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we put some guardrails in the OS_profile ? For example prevent the provisioning or a/b updates based on the type of the profile

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add STANDALONE_OS as type and ofrce the user to use that

@ajaythakurintel ajaythakurintel added the Proposal Identify a PR as a design proposal to be reviewed. label May 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposal Identify a PR as a design proposal to be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants