Skip to content

Commit 645e9f0

Browse files
authored
ADR for separation of onboarding and provisioning flow (#1150)
Signed-off-by: Srinivasamurthy, Ramakrishna <ramakrishna.srinivasamurthy@intel.com>
1 parent b0f27ab commit 645e9f0

1 file changed

Lines changed: 173 additions & 0 deletions

File tree

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
# Design Proposal: Split onboarding and OS provisioning flows in the EIM
2+
3+
Author(s): Edge Manageability Architecture Team
4+
5+
Last updated: 17th Nov 2025
6+
7+
## Abstract
8+
9+
Existing onboarding of edge node has to go through the provisioning workflow
10+
to install the operating system and required agents so that edge node is ready
11+
to manage by EMF orchestrator. It becomes to mandatory to test any day2 flows
12+
like VPro features, cluster orchestration, app orchestration. If customer or
13+
the user has an edge node which is already provisioned with the required
14+
Operation system meeting EMF requirements then it is an additional overhead to
15+
go through entire OS provisioning workflow to repeat the same process. Customer
16+
might have their own OS provisioning to install the OS in the edge node then we
17+
should support the onboarding of that edge node to EMF orchestrator and start
18+
using the day2 workflows. It becomes neccessity to split the onboarding and
19+
provisioning flow and it should be configurable by the end user when they want
20+
to onboard an edge to EMF to go through only onboarding flow without OS
21+
provisiong or with OS provisioning.
22+
23+
## Requirements
24+
25+
When an user choose to onboard an edge node to EMF orchestrator, there should
26+
be an option to use to opt for provisioning flow or skip it entirely. When user
27+
opt for the onboard with OS provisioning option then it is exising workflow to
28+
go through the entire the Day0 workflow(installation OS, edge node installation
29+
and configuration). OS provisioning workflow differs between the 2 EMF
30+
supported Operating systems(EMT and Ubuntu) because EMT is pre-bundled with
31+
Edge node agents because immutability.
32+
33+
### Existing provsioning workflow for Ubuntu includes below steps
34+
35+
1. Ubuntu installation by downloading the canonical base server image
36+
37+
2. Upgrade the required base kernel as part provisioning flow to support intel
38+
platforms features. Required base kernel for Intel platforms like ADL, RPL are
39+
- Ubuntu 22.04 -> Kernel 6.8.x
40+
41+
- Ubuntu 24.05 -> Kernel 6.11.x
42+
43+
3. Install edge node agents as post installation of OS. DKAM curates the
44+
installer script with required EMF compatible versions of EN agents
45+
(by making use of EN manifest file) along with their configurations.
46+
It configures apt package manager with EMF release service where edge node
47+
agents debians are hosted.
48+
49+
4. Edge node agent configurations include resource managers end points,
50+
key cloak credentials, Edge node proxy configurations, edge node agent's log
51+
rotation policies, etc.
52+
53+
5. Starting the edge node agent as systemd services and enable them to start
54+
reboot.
55+
56+
### EMF orchstrator side steps to skip the provisioning flow
57+
58+
1. Users should be able to register edge nodes with the EMF orchestrator by
59+
opting in or skipping the provisioning flow. When users choose onboarding
60+
without OS provisioning, they must complete minimal steps on the edge node to
61+
register with the EMF orchestrator. The onboarding steps that must be performed
62+
on the edge node include:
63+
64+
- Kernel upgrade to installed base kernel version required for the intel
65+
supported platforms as defined above provisioning flow.
66+
67+
- Edge node agents installation
68+
69+
- Additional system packages to be installed on the edge node to run the
70+
edge node agents.
71+
72+
- It includes Edge node agents configuration files with all required
73+
version of EN agents which is compatible with EMF, infra-managers end-
74+
points, keycloak credentials etc. This configuration also include what are
75+
specific agents to be installed on the edge node based on EMF capabilities
76+
(EIM with AMT, App orchestration, cluster orchestartion, observability)
77+
78+
2. Run the new onboarding agent to perform Non-interactive onboarding and get
79+
the keycloak host specific credentials for the edge node agents.
80+
81+
3. Start other agents once key cloak credentials received from EMF orchestrator
82+
and enable them to start on reboot.
83+
84+
4. If the host resource is associated with custom config(cloud-init) then it
85+
there should be a way to run run cloud-init stpes on the edge node as a post
86+
onboarding step.
87+
88+
## Scope and Implementation plan
89+
90+
High level tasks in EMF to make the provisioning workflow as optional
91+
92+
1. Device disocery agent - Build debian for the device discovery agent and
93+
include it in the agent installer script which does onboarding(non-interactive)
94+
and gets required keycloak credentials to the edge node. Until device discovery
95+
completes the onboarding other agents installation shouldn't be started.
96+
97+
2. DKAM - should curate and host the installer script in the tinker-nginx
98+
service. It should also read the configuration with enabled capabilities of
99+
EMF (App orchestration, cluster orchestration, observability). Based on that
100+
configuration it shall include the respective agent installations. It should
101+
include device discovery agent.
102+
103+
3. Onboarding manager - Should skip the provisioning flow to creation of the
104+
tinkerbell workflow if edge node is registered with skip provisioning flow
105+
option. It should update the inventory with required instance resource fields
106+
like provisioning status and status indicator fields. Creation of instance
107+
resource with mapping OS resource of Edge node.
108+
109+
4. API-v2 and inventory changes to include new field, skip provisioning flow in
110+
host resource.
111+
112+
5. Orch-cli/Infra web-ui changes for device registration to include the new
113+
field skip provisiong flow. By default skip provisioning flow will be set to
114+
false.
115+
116+
## Workflow
117+
118+
```mermaid
119+
sequenceDiagram
120+
autonumber
121+
participant User
122+
box rgba(32, 194, 142, 1) Edge Node
123+
participant DeviceDiscovery as Device Discovery Agent
124+
participant EdgeNode as Edge Node
125+
end
126+
127+
box rgba(10, 184, 242, 1) Orchestrator Components
128+
participant API as API-v2
129+
participant TinkerNginx as Tinker-Nginx
130+
participant OnboardingMgr as Onboarding Manager
131+
participant DKAM
132+
participant Inventory
133+
end
134+
135+
DKAM->>DKAM: Read EMF capabilities from infra-config
136+
DKAM->>TinkerNginx: Curate & host installer script
137+
138+
User->>API: Register edge node using orch-cli/UI(skip provisioning = true)
139+
API->>Inventory: Create host resource with provisioning skipped
140+
API-->>User: Registration confirmed
141+
142+
143+
Note over User,EdgeNode: User shall trigger the onboarding flow
144+
User->>EdgeNode: Login to edge node which has Ubuntu 22.04 or 24.04 pre-installed
145+
EdgeNode->>TinkerNginx: Download installer script to edge node
146+
EdgeNode->>EdgeNode: Run the installer script Install system packages
147+
EdgeNode->>EdgeNode: Upgrade kernel (if needed)
148+
EdgeNode->>EdgeNode: Install Device Discovery Agent (debian)
149+
DeviceDiscovery->>DeviceDiscovery: Start Device Discovery Agent
150+
DeviceDiscovery->>OnboardingMgr: Non-interactive onboarding request(TLS)
151+
alt If device not found
152+
OnboardingMgr->>DeviceDiscovery: Error Device not found
153+
else
154+
OnboardingMgr->>Inventory: Update Onboarding and Provisioning status as completed
155+
OnboardingMgr->>DeviceDiscovery: Return onboarding credentials
156+
EdgeNode->>EdgeNode: Install node agent and other EN agents
157+
EdgeNode->>EdgeNode: Configure agents with onboading credentials
158+
EdgeNode->>EdgeNode: EN agent communicate with respective Infra managers
159+
EdgeNode->>EdgeNode: Enable and Start all agents as systemd services
160+
end
161+
EdgeNode->>EdgeNode: Ready for Day2 operations(Update & remote power management)
162+
```
163+
164+
## Opens
165+
166+
- In the interactive onboarding mapping the instance resource to OS resource
167+
is done by taking OS version from the Edge node during device discovery stage.
168+
169+
- Cluster creation might need the required partitions to be created which is
170+
done during the provisioning flow.
171+
172+
- The kernel may need to be upgraded depending on the platform to enable
173+
platform-specific features.

0 commit comments

Comments
 (0)