Skip to content

Commit b918ebd

Browse files
committed
feature: open dmt integration in emf
1 parent c4eb860 commit b918ebd

File tree

2 files changed

+245
-0
lines changed

2 files changed

+245
-0
lines changed

design-proposals/dmt-integration.md

Lines changed: 244 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
# Design Proposal: OpenDMT integration in Edge Manageability Framework
2+
3+
Author(s): Edge Infrastructure Manager Team
4+
5+
Last updated: 05/15/2025
6+
7+
## Abstract
8+
9+
[Open Device Management Toolkit](https://device-management-toolkit.github.io/docs/2.27/GetStarted/overview/) (Open DMT)
10+
provides an open source stack through which is possible to manage
11+
vPRO/ActiveManagementTechnology(AMT)/IntelStandardManageability(ISM) enabled devices.
12+
13+
![EMF Stack](./images/eim-int.svg)
14+
15+
This document describes the design proposal for integrating OpenDMT components in EMF in a seamless way and have them
16+
directly available to Edge Infrastructure Managerment services or any other service running in the orchestrator
17+
18+
**Note:** that with reference to the above figure only device activation/deactivation and power management will be
19+
addressed in the release 3.1.
20+
21+
## Proposal
22+
23+
The cloud-toolkit includes two core components: Management Presence Server (MPS) and Remote Provisioning Server (RPS),
24+
see [DMT documentation](https://device-management-toolkit.github.io/docs/2.27/GetStarted/overview/) for furhter details.
25+
26+
MPS and RPS are extended and deployed together the Edge Infrastructure Manager micro-services; they are cloud-native
27+
and can be deployed using the DMT charts. At the time of writing a PoC has been realized showcasing their initial
28+
integration in Edge Instracture Manager charts.
29+
30+
In the following diagram, we represent the deployment of the DMT services.
31+
32+
```mermaid
33+
graph TD
34+
%% Traefik Gateway - Middle Level
35+
Traefik["Traefik Gateway"]
36+
%% MT-Gateway - Middle Level
37+
MTGW["MT Gateway"]
38+
%% Intel AMT* Services Box - Middle Section
39+
subgraph "Intel AMT* Services"
40+
RPS["Remote Provisioning Server (RPS)"]
41+
MPS["Management Presence Server (MPS)"]
42+
end
43+
%% Connections
44+
%% Edge Node - Bottom Level
45+
subgraph "Edge Node"
46+
RPC["Remote Provisioning Client (RPC)"]
47+
AMT["AMT Device"]
48+
end
49+
%% Connections
50+
RPC -->|443/RPS-WS| Traefik
51+
AMT -->|4433/CIRA| MPS
52+
User -->|443| Traefik
53+
Traefik -->|443| MTGW
54+
MTGW -->|3000/AMT-Device|MPS
55+
MTGW -->|8080/Domain|RPS
56+
MTGW -->|8081/WS|RPS
57+
```
58+
59+
The Remote Provisioning Client (RPC) application runs on the managed device/Edge Node and communicates with the RPS
60+
microservice on the development system. The RPC and RPS configure and [activate](./vpro-device.md) Intel AMT on the
61+
managed device. Once properly configured, the remote managed device can call home to the MPS by establishing a Client
62+
Initiated Remote Access (CIRA) connection with the MPS.
63+
64+
CIRA enables a CIRA-capable edge device to initiate and establish a persistent connection to the MPS. As long as the
65+
managed device is connected to the network and to a power source, it can maintain a persistent connection.
66+
67+
**Note1:** CIRA connection is terminated directly in MPS service;
68+
69+
**Note2:** Traffic on port 8081 is the ws established between RPC-RPS and is used to perform the configuration
70+
71+
**Note3:** Port 8080 is "exposed" by MT-GW to allow the configuration of the Domain and the Provisioning Certificate
72+
73+
**Note4:** Port 3000 is "exposed" by MT-GW to allow the retrieval of the AMT device information and potentially expose
74+
to OBaaS audit logs and events
75+
76+
In DMT stack, Mosquitto can be deployed as MQTT broker to avoid the constant polling of MPS/RPS services. This will be
77+
considered as future work to improve the scalability of the layered architecture.
78+
79+
Other tools such as Kong and Kuma, respectively used as traffic gateway and service mesh will be replaced by Traefik
80+
and Istio which are currently the tools in use by the EMF platform.
81+
82+
DMT should be configured to leverage platform services such as the centralized Database and EMF secrets service.
83+
However they cannot be used out-of-the-box and seamless integrated in EMF: the DMT services need to be aware of EMF
84+
internals to store credentials in Vault (token expiring after 1h), handle properly Multitenancy and validate the tenant
85+
ids.
86+
87+
Additionally, tokens need to be properly handled and specific roles should be created in Keycloak. As regards the
88+
database, MPS/RPS can share the same DB of the other EMF micro-services. It is required though to create a new
89+
instance for DMT services where the RPS/MPS tables will live logically separated from the EIM/CO/AO tables.
90+
91+
**Note:** tenantID in DMT uses UUID format and it can be provided as input to the RPC client when it is started.
92+
However MPS/RPS services need to be
93+
[extended](https://device-management-toolkit.github.io/docs/2.27/Reference/middlewareExtensibility/) in order to
94+
properly handle multi-tenancy same as Keycloak tokens.
95+
96+
AMT/vPRO works using two exclusive modes: Client Control Mode and Admin Control Mode. The first provides full access to
97+
features of Intel® AMT, but it does require user consent for all redirection features. The latter provides full access
98+
as well but the User consent is optional for supported redirection features and comes with the "penalty" of requiring
99+
an additional (Domain and Provisioning certificate) configuration.
100+
101+
**Note:** CCM is not suitable for our deployment scenarios given that providing user consent implies having monitor on
102+
sites which might not be possible. For this reason, ACM will be the mode in use.
103+
104+
In terms of input, DMT requires the creation of the following configurations:
105+
106+
**Client Initiated Remote Access (CIRA)** config that enables a CIRA-capable edge device to initiate and establish a
107+
persistent connection to the MPS. As long as the managed device is connected to the network and to a power source, it
108+
can maintain a persistent connection. This
109+
[configuration](https://device-management-toolkit.github.io/docs/2.27/GetStarted/Cloud/createCIRAConfig/) can be
110+
automated using the set of information already available in the EMF env variables, config map and etc. See
111+
[DM Resource Manager](../dm-manager) for major details.
112+
113+
**ACM profile** config that enables the ACM mode in the device, it has a dependency with the **Domain Configuration**.
114+
This [configuration](https://device-management-toolkit.github.io/docs/2.27/GetStarted/Cloud/createProfileACM/) can be
115+
automated using the set of information already available in the EMF env variables, config map and etc. See
116+
[DM Resource Manager](../dm-manager) for major details.
117+
118+
**Domain profile** is required by the ACM profile activation. This [configuration][domain-profile] cannot be automated
119+
and requires the user to purchase and provide the provisioning certificate using PFX format. Additionally the
120+
**DNS suffix** must be either set manually through MEBX or DHCP Option 15; it should be set to match the FQDN of the
121+
provisioning certificate .
122+
123+
For this configuration we expect the user to interact directly with RPS. This would mean that extensions to MT-GW will
124+
be required too and RPS should be extended in order to handle MT.
125+
126+
```mermaid
127+
sequenceDiagram
128+
%%{wrap}%%
129+
autonumber
130+
participant US as User
131+
participant TR as Traefik
132+
participant MT as MT-GW
133+
participant MPS as MPS
134+
participant PS as psqlDB
135+
US ->> TR: Create Domain Profile
136+
activate TR
137+
TR ->> TR: Verify JWT token
138+
TR ->> MT: Create Domain Profile
139+
activate MT
140+
MT ->> MT: Extract ProjectID
141+
MT ->> MPS: Create Domain Profile
142+
activate MPS
143+
MPS ->> MPS: Verify JWT token
144+
MPS ->> MPS: Extract ProjectID
145+
MPS ->> PS: Store Domain Profile
146+
MPS ->> MT: OK
147+
deactivate MPS
148+
MT ->> TR: OK
149+
deactivate MT
150+
TR ->> US: OK
151+
deactivate TR
152+
```
153+
154+
The configuration is per-tenant and we expect each tenant to have its own provisioning certificate. The user is capable
155+
to change the `Domain` configuration by removing the existing and uploading a new one. There will be multiple domain
156+
configurations depending on how the edge infrastructure is deployed (ideally in each site there will be multiple network
157+
segments).
158+
159+
**Note:** it is important to control e2e the environment and it is not possible to transfer devices from a domain to another
160+
without disruptions.
161+
162+
**WLAN configuration** is not supported by GNU/Linux derived OSes. See [documentation][wireless-config] for more details.
163+
164+
**LAN configuration** is not considered in the existing requirements. This configuration needs to be pushed through RPS
165+
and cannot be automated in anyhow by EIM. See [documentation][lan-config] for more details.
166+
167+
## Rationale
168+
169+
Using directly the DMT services has the undeniable advantage of providing a baseline to start with, otherwise we have
170+
to start from scratch. However, from the poc is clear that some extensions are required.
171+
172+
One shortcoming of the MPS/RPS services is that they are written using Node.js.
173+
174+
Aspect to consider is that DMT does not cover all the featues exposed by vPRO skus and in future we might be required
175+
to extend their capabilities in order to support advanced features as reprovision the device using HTTPs boot option
176+
or secure remote erase.
177+
178+
For this reason and what stated above is crucial to start thinking rewriting DMT core services using another
179+
technology such as **go**.
180+
181+
Another undeniable advantage is the handling of the migrations and the creation of the db which at the time of writing
182+
are done using a [manual process](https://device-management-toolkit.github.io/docs/2.27/Deployment/upgradeVersion/).
183+
184+
Another design choice considers to not expose MPS/RPS services through the MT-GW and bridge the requests through
185+
EIM. How to achieve this and if we should purse is left as an open question.
186+
187+
## Affected components and Teams
188+
189+
We report hereafter the affected components and teams:
190+
191+
- Several platform services will be affected and the active support from the Foundational Platform Services team is
192+
required to execute the integration with FPSs.
193+
- IAM/MT-GW, Keycloak, Database, Traefik, Istio, Vault are the main services affected
194+
- UI should support
195+
- the creation and the removal of the Domain configurations by extending the Admin page.
196+
- power management commands exposed by MPS
197+
- Automation and infrastructure teams should pay careful attention when setting up the environments to test the technology
198+
199+
## Implementation plan
200+
201+
Hereafter we present as steps the proposed plan in the release 3.1.
202+
203+
- DMT stack is integrated and deployed as part of the `infra-external` charts
204+
- Split user-db creation and integrate db-creation as part of the charts
205+
- Move user creation to the installer script
206+
- Introduce new roles and possibly groups to have fine-grain tokens
207+
- Substitute Kong and Kuma respectively with Traefik and Istio
208+
- Vault root creation and refresh logic need to be properly implemented
209+
- Integrate MT-GW with RPS/MPS and expose their services
210+
- Extend MPS to properly handle JWT tokens and ActiveProjectID
211+
- Requests from the north will have ActiveProjectID and the JWT token
212+
- Requests from the south will have only the JWT token
213+
- Extend RPS to properly handle JWT tokens and ActiveProjectID
214+
- Requests from the north will have ActiveProjectID and the JWT token
215+
- Requests from the south will have only the JWT token
216+
- UI to integrate with the necessary APIs exposed by RPS and MPS
217+
218+
## Test plan
219+
220+
**Unit tests** will be extended accordingly in the affected components and possibly in the DTM components the extensions
221+
and the unit tests will be upstreamed.
222+
223+
**VIP tests** should verify the deployment and FPS/IAM integration. Additionally, tests should be written to verify Domain
224+
creation and issuing power management commands.
225+
226+
New **HIP tests** involving hardware devices will be written to verify the complete e2e flow.
227+
228+
All the aforementioned tests should include negative and failure scenarios such as failed activations, unsupported
229+
operations.
230+
231+
## Open issues (if applicable)
232+
233+
Integration with Mosquitto is left for future iterations.
234+
235+
MPSRouter is additionally deployed to address MPS scalability. Should FPS consider its integration and dependency with Istio?
236+
237+
OpenDMT stack offer device audit log and events. Should OBaaS consider to integrate these features in the stack?
238+
239+
If there are issues with the stack, either we fork or we open github issues. Shall we consider to rewrite MPS/RPS in go?
240+
The tight deadlines make this very proihibitive.
241+
242+
[domain-profile]: https://device-management-toolkit.github.io/docs/2.27/GetStarted/Cloud/createProfileACM/#create-a-domain-profile/
243+
[wireless-config]: https://device-management-toolkit.github.io/docs/2.27/Reference/EA/RPSConfiguration/remoteIEEE8021xConfig/
244+
[lan-config]: https://device-management-toolkit.github.io/docs/2.27/Reference/EA/RPSConfiguration/remoteIEEE8021xConfig/

0 commit comments

Comments
 (0)