|
| 1 | +# Design Proposal: OpenDMT integration in Edge Manageability Framework |
| 2 | + |
| 3 | +Author(s): Edge Infrastructure Manager Team |
| 4 | + |
| 5 | +Last updated: 05/15/2025 |
| 6 | + |
| 7 | +## Abstract |
| 8 | + |
| 9 | +[Open Device Management Toolkit](https://device-management-toolkit.github.io/docs/2.27/GetStarted/overview/) (Open DMT) |
| 10 | +provides an open source stack through which is possible to manage |
| 11 | +vPRO/ActiveManagementTechnology(AMT)/IntelStandardManageability(ISM) enabled devices. |
| 12 | + |
| 13 | + |
| 14 | + |
| 15 | +This document describes the design proposal for integrating OpenDMT components in EMF in a seamless way and have them |
| 16 | +directly available to Edge Infrastructure Managerment services or any other service running in the orchestrator |
| 17 | + |
| 18 | +**Note:** that with reference to the above figure only device activation/deactivation and power management will be |
| 19 | +addressed in the release 3.1. |
| 20 | + |
| 21 | +## Proposal |
| 22 | + |
| 23 | +The cloud-toolkit includes two core components: Management Presence Server (MPS) and Remote Provisioning Server (RPS), |
| 24 | +see [DMT documentation](https://device-management-toolkit.github.io/docs/2.27/GetStarted/overview/) for furhter details. |
| 25 | + |
| 26 | +MPS and RPS are extended and deployed together the Edge Infrastructure Manager micro-services; they are cloud-native |
| 27 | +and can be deployed using the DMT charts. At the time of writing a PoC has been realized showcasing their initial |
| 28 | +integration in Edge Instracture Manager charts. |
| 29 | + |
| 30 | +In the following diagram, we represent the deployment of the DMT services. |
| 31 | + |
| 32 | +```mermaid |
| 33 | +graph TD |
| 34 | + %% Traefik Gateway - Middle Level |
| 35 | + Traefik["Traefik Gateway"] |
| 36 | + %% MT-Gateway - Middle Level |
| 37 | + MTGW["MT Gateway"] |
| 38 | + %% Intel AMT* Services Box - Middle Section |
| 39 | + subgraph "Intel AMT* Services" |
| 40 | + RPS["Remote Provisioning Server (RPS)"] |
| 41 | + MPS["Management Presence Server (MPS)"] |
| 42 | + end |
| 43 | + %% Connections |
| 44 | + %% Edge Node - Bottom Level |
| 45 | + subgraph "Edge Node" |
| 46 | + RPC["Remote Provisioning Client (RPC)"] |
| 47 | + AMT["AMT Device"] |
| 48 | + end |
| 49 | + %% Connections |
| 50 | + RPC -->|443/RPS-WS| Traefik |
| 51 | + AMT -->|4433/CIRA| MPS |
| 52 | + User -->|443| Traefik |
| 53 | + Traefik -->|443| MTGW |
| 54 | + MTGW -->|3000/AMT-Device|MPS |
| 55 | + MTGW -->|8080/Domain|RPS |
| 56 | + MTGW -->|8081/WS|RPS |
| 57 | +``` |
| 58 | + |
| 59 | +The Remote Provisioning Client (RPC) application runs on the managed device/Edge Node and communicates with the RPS |
| 60 | +microservice on the development system. The RPC and RPS configure and [activate](./vpro-device.md) Intel AMT on the |
| 61 | +managed device. Once properly configured, the remote managed device can call home to the MPS by establishing a Client |
| 62 | +Initiated Remote Access (CIRA) connection with the MPS. |
| 63 | + |
| 64 | +CIRA enables a CIRA-capable edge device to initiate and establish a persistent connection to the MPS. As long as the |
| 65 | +managed device is connected to the network and to a power source, it can maintain a persistent connection. |
| 66 | + |
| 67 | +**Note1:** CIRA connection is terminated directly in MPS service; |
| 68 | + |
| 69 | +**Note2:** Traffic on port 8081 is the ws established between RPC-RPS and is used to perform the configuration |
| 70 | + |
| 71 | +**Note3:** Port 8080 is "exposed" by MT-GW to allow the configuration of the Domain and the Provisioning Certificate |
| 72 | + |
| 73 | +**Note4:** Port 3000 is "exposed" by MT-GW to allow the retrieval of the AMT device information and potentially expose |
| 74 | +to OBaaS audit logs and events |
| 75 | + |
| 76 | +In DMT stack, Mosquitto can be deployed as MQTT broker to avoid the constant polling of MPS/RPS services. This will be |
| 77 | +considered as future work to improve the scalability of the layered architecture. |
| 78 | + |
| 79 | +Other tools such as Kong and Kuma, respectively used as traffic gateway and service mesh will be replaced by Traefik |
| 80 | +and Istio which are currently the tools in use by the EMF platform. |
| 81 | + |
| 82 | +DMT should be configured to leverage platform services such as the centralized Database and EMF secrets service. |
| 83 | +However they cannot be used out-of-the-box and seamless integrated in EMF: the DMT services need to be aware of EMF |
| 84 | +internals to store credentials in Vault (token expiring after 1h), handle properly Multitenancy and validate the tenant |
| 85 | +ids. |
| 86 | + |
| 87 | +Additionally, tokens need to be properly handled and specific roles should be created in Keycloak. As regards the |
| 88 | +database, MPS/RPS can share the same DB of the other EMF micro-services. It is required though to create a new |
| 89 | +instance for DMT services where the RPS/MPS tables will live logically separated from the EIM/CO/AO tables. |
| 90 | + |
| 91 | +**Note:** tenantID in DMT uses UUID format and it can be provided as input to the RPC client when it is started. |
| 92 | +However MPS/RPS services need to be |
| 93 | +[extended](https://device-management-toolkit.github.io/docs/2.27/Reference/middlewareExtensibility/) in order to |
| 94 | +properly handle multi-tenancy same as Keycloak tokens. |
| 95 | + |
| 96 | +AMT/vPRO works using two exclusive modes: Client Control Mode and Admin Control Mode. The first provides full access to |
| 97 | +features of Intel® AMT, but it does require user consent for all redirection features. The latter provides full access |
| 98 | +as well but the User consent is optional for supported redirection features and comes with the "penalty" of requiring |
| 99 | +an additional (Domain and Provisioning certificate) configuration. |
| 100 | + |
| 101 | +**Note:** CCM is not suitable for our deployment scenarios given that providing user consent implies having monitor on |
| 102 | +sites which might not be possible. For this reason, ACM will be the mode in use. |
| 103 | + |
| 104 | +In terms of input, DMT requires the creation of the following configurations: |
| 105 | + |
| 106 | +**Client Initiated Remote Access (CIRA)** config that enables a CIRA-capable edge device to initiate and establish a |
| 107 | +persistent connection to the MPS. As long as the managed device is connected to the network and to a power source, it |
| 108 | +can maintain a persistent connection. This |
| 109 | +[configuration](https://device-management-toolkit.github.io/docs/2.27/GetStarted/Cloud/createCIRAConfig/) can be |
| 110 | +automated using the set of information already available in the EMF env variables, config map and etc. See |
| 111 | +[DM Resource Manager](../dm-manager) for major details. |
| 112 | + |
| 113 | +**ACM profile** config that enables the ACM mode in the device, it has a dependency with the **Domain Configuration**. |
| 114 | +This [configuration](https://device-management-toolkit.github.io/docs/2.27/GetStarted/Cloud/createProfileACM/) can be |
| 115 | +automated using the set of information already available in the EMF env variables, config map and etc. See |
| 116 | +[DM Resource Manager](../dm-manager) for major details. |
| 117 | + |
| 118 | +**Domain profile** is required by the ACM profile activation. This [configuration][domain-profile] cannot be automated |
| 119 | +and requires the user to purchase and provide the provisioning certificate using PFX format. Additionally the |
| 120 | +**DNS suffix** must be either set manually through MEBX or DHCP Option 15; it should be set to match the FQDN of the |
| 121 | +provisioning certificate . |
| 122 | + |
| 123 | +For this configuration we expect the user to interact directly with RPS. This would mean that extensions to MT-GW will |
| 124 | +be required too and RPS should be extended in order to handle MT. |
| 125 | + |
| 126 | +```mermaid |
| 127 | +sequenceDiagram |
| 128 | + %%{wrap}%% |
| 129 | + autonumber |
| 130 | + participant US as User |
| 131 | + participant TR as Traefik |
| 132 | + participant MT as MT-GW |
| 133 | + participant MPS as MPS |
| 134 | + participant PS as psqlDB |
| 135 | + US ->> TR: Create Domain Profile |
| 136 | + activate TR |
| 137 | + TR ->> TR: Verify JWT token |
| 138 | + TR ->> MT: Create Domain Profile |
| 139 | + activate MT |
| 140 | + MT ->> MT: Extract ProjectID |
| 141 | + MT ->> MPS: Create Domain Profile |
| 142 | + activate MPS |
| 143 | + MPS ->> MPS: Verify JWT token |
| 144 | + MPS ->> MPS: Extract ProjectID |
| 145 | + MPS ->> PS: Store Domain Profile |
| 146 | + MPS ->> MT: OK |
| 147 | + deactivate MPS |
| 148 | + MT ->> TR: OK |
| 149 | + deactivate MT |
| 150 | + TR ->> US: OK |
| 151 | + deactivate TR |
| 152 | +``` |
| 153 | + |
| 154 | +The configuration is per-tenant and we expect each tenant to have its own provisioning certificate. The user is capable |
| 155 | +to change the `Domain` configuration by removing the existing and uploading a new one. There will be multiple domain |
| 156 | +configurations depending on how the edge infrastructure is deployed (ideally in each site there will be multiple network |
| 157 | +segments). |
| 158 | + |
| 159 | +**Note:** it is important to control e2e the environment and it is not possible to transfer devices from a domain to another |
| 160 | +without disruptions. |
| 161 | + |
| 162 | +**WLAN configuration** is not supported by GNU/Linux derived OSes. See [documentation][wireless-config] for more details. |
| 163 | + |
| 164 | +**LAN configuration** is not considered in the existing requirements. This configuration needs to be pushed through RPS |
| 165 | +and cannot be automated in anyhow by EIM. See [documentation][lan-config] for more details. |
| 166 | + |
| 167 | +## Rationale |
| 168 | + |
| 169 | +Using directly the DMT services has the undeniable advantage of providing a baseline to start with, otherwise we have |
| 170 | +to start from scratch. However, from the poc is clear that some extensions are required. |
| 171 | + |
| 172 | +One shortcoming of the MPS/RPS services is that they are written using Node.js. |
| 173 | + |
| 174 | +Aspect to consider is that DMT does not cover all the featues exposed by vPRO skus and in future we might be required |
| 175 | +to extend their capabilities in order to support advanced features as reprovision the device using HTTPs boot option |
| 176 | +or secure remote erase. |
| 177 | + |
| 178 | +For this reason and what stated above is crucial to start thinking rewriting DMT core services using another |
| 179 | +technology such as **go**. |
| 180 | + |
| 181 | +Another undeniable advantage is the handling of the migrations and the creation of the db which at the time of writing |
| 182 | +are done using a [manual process](https://device-management-toolkit.github.io/docs/2.27/Deployment/upgradeVersion/). |
| 183 | + |
| 184 | +Another design choice considers to not expose MPS/RPS services through the MT-GW and bridge the requests through |
| 185 | +EIM. How to achieve this and if we should purse is left as an open question. |
| 186 | + |
| 187 | +## Affected components and Teams |
| 188 | + |
| 189 | +We report hereafter the affected components and teams: |
| 190 | + |
| 191 | +- Several platform services will be affected and the active support from the Foundational Platform Services team is |
| 192 | +required to execute the integration with FPSs. |
| 193 | + - IAM/MT-GW, Keycloak, Database, Traefik, Istio, Vault are the main services affected |
| 194 | +- UI should support |
| 195 | + - the creation and the removal of the Domain configurations by extending the Admin page. |
| 196 | + - power management commands exposed by MPS |
| 197 | +- Automation and infrastructure teams should pay careful attention when setting up the environments to test the technology |
| 198 | + |
| 199 | +## Implementation plan |
| 200 | + |
| 201 | +Hereafter we present as steps the proposed plan in the release 3.1. |
| 202 | + |
| 203 | +- DMT stack is integrated and deployed as part of the `infra-external` charts |
| 204 | +- Split user-db creation and integrate db-creation as part of the charts |
| 205 | +- Move user creation to the installer script |
| 206 | +- Introduce new roles and possibly groups to have fine-grain tokens |
| 207 | +- Substitute Kong and Kuma respectively with Traefik and Istio |
| 208 | +- Vault root creation and refresh logic need to be properly implemented |
| 209 | +- Integrate MT-GW with RPS/MPS and expose their services |
| 210 | +- Extend MPS to properly handle JWT tokens and ActiveProjectID |
| 211 | + - Requests from the north will have ActiveProjectID and the JWT token |
| 212 | + - Requests from the south will have only the JWT token |
| 213 | +- Extend RPS to properly handle JWT tokens and ActiveProjectID |
| 214 | + - Requests from the north will have ActiveProjectID and the JWT token |
| 215 | + - Requests from the south will have only the JWT token |
| 216 | +- UI to integrate with the necessary APIs exposed by RPS and MPS |
| 217 | + |
| 218 | +## Test plan |
| 219 | + |
| 220 | +**Unit tests** will be extended accordingly in the affected components and possibly in the DTM components the extensions |
| 221 | +and the unit tests will be upstreamed. |
| 222 | + |
| 223 | +**VIP tests** should verify the deployment and FPS/IAM integration. Additionally, tests should be written to verify Domain |
| 224 | +creation and issuing power management commands. |
| 225 | + |
| 226 | +New **HIP tests** involving hardware devices will be written to verify the complete e2e flow. |
| 227 | + |
| 228 | +All the aforementioned tests should include negative and failure scenarios such as failed activations, unsupported |
| 229 | +operations. |
| 230 | + |
| 231 | +## Open issues (if applicable) |
| 232 | + |
| 233 | +Integration with Mosquitto is left for future iterations. |
| 234 | + |
| 235 | +MPSRouter is additionally deployed to address MPS scalability. Should FPS consider its integration and dependency with Istio? |
| 236 | + |
| 237 | +OpenDMT stack offer device audit log and events. Should OBaaS consider to integrate these features in the stack? |
| 238 | + |
| 239 | +If there are issues with the stack, either we fork or we open github issues. Shall we consider to rewrite MPS/RPS in go? |
| 240 | +The tight deadlines make this very proihibitive. |
| 241 | + |
| 242 | +[domain-profile]: https://device-management-toolkit.github.io/docs/2.27/GetStarted/Cloud/createProfileACM/#create-a-domain-profile/ |
| 243 | +[wireless-config]: https://device-management-toolkit.github.io/docs/2.27/Reference/EA/RPSConfiguration/remoteIEEE8021xConfig/ |
| 244 | +[lan-config]: https://device-management-toolkit.github.io/docs/2.27/Reference/EA/RPSConfiguration/remoteIEEE8021xConfig/ |
0 commit comments