diff --git a/README.md b/README.md index d9634c46c..bb6cdee56 100644 --- a/README.md +++ b/README.md @@ -48,9 +48,9 @@ distributed edges - [UI](https://github.com/open-edge-platform/orch-ui): The web user interface for the Edge Orchestrator, allowing the user to manage most of the features of the product in an intuitive, visual, manner without having to trigger a series of APIs individually. -- [CLI](https://github.com/open-edge-platform/orch-cli): The command line interface for the Edge Orchestrator, allowing the -user to manage most of the features of the product in an intuitive, text-based manner without having to trigger a series -of APIs individually. +- [CLI](https://github.com/open-edge-platform/orch-cli): The command line interface for the Edge Orchestrator, +allowing the user to manage most of the features of the product in an intuitive, +text-based manner without having to trigger a series of APIs individually. - [Observability](https://docs.openedgeplatform.intel.com/edge-manage-docs/main/developer_guide/observability/index.html): A modular observability stack that provides visibility into the health and performance of the system, including logging, reporting, alerts, and SRE data from Edge Orchestrator components and Edge Nodes. diff --git a/design-proposals/app-orch-deploy-applications.md b/design-proposals/app-orch-deploy-applications.md index fff3ca2f4..ee430f0d7 100755 --- a/design-proposals/app-orch-deploy-applications.md +++ b/design-proposals/app-orch-deploy-applications.md @@ -197,7 +197,8 @@ like Profiles and Parameter Templates are lost. - Update the `Edit Deployment` page similar to the changes to `Create Deployment` - Update the `Deployments` list page to support linkage to both Apps and DP - - Update the `Application` page to have a deployment link and display the `is_deployed` field in both list and detail view. + - Update the `Application` page to have a deployment link and display the `is_deployed` + field in both list and detail view. - Update any status tables and dashboards as necessary to support these changes. diff --git a/design-proposals/eim-nbapi-cli-decomposition.md b/design-proposals/eim-nbapi-cli-decomposition.md new file mode 100644 index 000000000..69d923db1 --- /dev/null +++ b/design-proposals/eim-nbapi-cli-decomposition.md @@ -0,0 +1,440 @@ +# Design Proposal: Scenario-Specific Northbound APIs and CLI Commands for EIM Decomposition + +Author(s) Edge Infrastructure Manager Team + +Last updated: 18/12/25 + +## Abstract + +In the context of EIM decomposition, the North Bound API service should be treated as an +independent interchangeable module. +The [EIM proposal for modular decomposition](https://github.com/open-edge-platform/edge-manageability-framework/blob/main/design-proposals/eim-modular-decomposition.md) +calls out a need for exposing both a full set of EIM APIs, and a need for exposing only a subset of EIM API +as required by individual workflows taking advantage of a modular architecture. +This proposal explores how the exposed APIs can be decomposed +and adjusted to reflect only the supported EIM services per particular scenario. +It defines how different scenarios can be supported by API versions that match only the services +and features required per scenario, while keeping the full API support in place. + +## Background and Context + +There are multiple levels of APIs currently available within EMF, with individual specs available for +each domain in +[orch-utils](https://github.com/open-edge-platform/orch-utils/tree/main/tenancy-api-mapping/openapispecs/generated). + +The list of domain APIs includes: + +- Catalog and Catalog utilities APIs +- App deployment manager and app resource manager APIs +- Cluster APIs +- EIM APIs +- Alert Monitoring APIs +- MPS and RPS APIs +- Metadata broker and Tenancy APIs + +There are two levels to the API decomposition: + +- **Cross-domain decomposition**: Separation of the above domain-level APIs +(e.g., only exposing EIM APIs - without Cluster APIs, App Orchestrator APIs and others). +- **Intra-domain decomposition**: Separation within a domain (e.g., at the EIM domain level, +where the overall set of APIs may include onboarding/provisioning/Day 2 APIs, +but another workflow may support only onboarding/provisioning without Day 2 support). + +The following questions must be answered and investigated: + +- How is the API service built currently? + - It is built from a proto definition and code is autogenerated by the "buf" tool - + [See How NB API is Currently Built](#how-nb-api-is-currently-built) +- How is the API service container image built currently? +- How are the API service Helm charts built currently? +- What level of decomposition is needed for the required workflows? +- How to decompose APIs at the domain level? + - At the domain level, the APIs are deployed as separate services +- How to decompose APIs within the domain level? +- How to build various API service versions as per desired workflows using the modular APIs? +- How to deliver the various API service versions as per desired workflows? +- How to expose the list of available APIs for client consumption (orch-cli)? + +### Scenarios to be Supported by the Initial Decomposition + +Currently planned decomposition tasks is focused on the EIM layer. The following is the list of deployment scenarios: + +- **Full EMF** - Full EMF including all existing levels of APIs. +- **EIM Only** - EIM installed on its own, includes only the existing EIM APIs. +- **EIM vPRO Only** - EIM installed on its own, including only the EIM APIs required to support vPRO use cases. + +### About EIM API (apiv2) + +In Edge Infratructure Manager (EIM) the apiv2 service represents the North Bound API service that exposes +the EIM operations to the end user, who uses Web UI, Orch-CLI or direct API calls. Currently, +the end user is not allowed to call the EIM APIs directly. The API calls reach first the API gateway, external +to EIM (Traefik gateway), thay are mapped to EIM internal API endpoints and passed to EIM. + +**Note**: The current mapping of external APIs to internal APIs is 1:1, with no direct mapping to SB APIs. +The API service communicates with Inventory via gRPC, which then manages the SB API interactions. + +**Apiv2** is just one of EIM Resource Managers that talk to one EIM internal component - the Inventory - over gRPC. +Similar to other RMs, it updates status of the Inventory resources and retrieves their status allowing user +performing operations on the EIM resources for manipulating Edge Nodes. +In EMF 2025.2, the apiv2 service is deployed via a helm chart deployed by Argo CD as one of its applications. +The apiv2 service is run and deployed in a container kick-started from the apiv2 service container image. + +#### How NB API is Currently Built + +Currently, apiv2 (infra-core repository) holds the definition of REST API services in protocol buffer files +(.proto) and uses protoc-gen-connect-openapi to autogenerate the OpenAPI spec - openapi.yaml. + +The input to protoc-gen-connect-openapi comes from: + +- `api/proto/services` directory - one file (services.proto) containing API operations on +all the available resources (Service Layer). +- `api/proto/resources` directory - multiple files with data models - separate file with data +model per single inventory resource. + +Protoc-gen-connect-openapi is the tool that is indirectly used to build the openapi spec. +It is configured as a plugin within buf (buf.gen.yaml). + +#### About Buf + +Buf is a replacement for protoc (the standard Protocol Buffers compiler). It makes working with +.proto files easier as it replaces messy protoc commands with clean config file. +It is a all-in-one tool as it provides compiling, linting, breaking change detection, and dependency management. + +In infra-core/apiv2, **buf generate** command is executed within the **make generate** or +**make buf-gen** target to generate the OpenAPI 3.0 spec directly from .proto files in api/proto/ directory. + +Protoc-gen-connect-openapi plugin takes as an input one full openapi spec that includes all services +(services.proto) and outputs the openapi spec in api/openapi. + +Key Items: + +- Input: api/proto/**/*.proto +- Config: buf.gen.yaml, buf.work.yaml, buf.yaml +- Output: openapi.yaml +- Tool: protoc-gen-connect-openapi + +Based on the content of api/proto/ , buf also generates: + +- The Go code ( Go structs, gRPC clients/services) in internal/pbapi. +- gRPC gateway: REST to gRPC proxy code - HTTP handlers that proxy REST calls to gRPC (in internal/pbapi/**/*.pb.gw.go). +- Documentation: docs/proto.md. + +## Decomposing the API service + +An investigation needs to be conducted into how the API service can be decomposed to be rebuilt as various +flavors of the same API service providing different sets of APIs. + +**Design Principles:** + +1. **Single Source of Truth**: The total set of APIs serves as the main source of the API service, +and other API subsets are automatically derived from this based on required functionality. +This makes maintenance simple and centralized. + +2. **Domain-Level Decomposition**: The API service should be decomposed at the domain level, +meaning that all domains or a subset of domains should be available as part of the EMF. + - At this level, APIs are already decomposed/modular and deployed as separate services + (e.g., EIM APIs, Cluster APIs, App Orchestrator APIs). + - **For EIM-focused scenarios**: Only the EIM domain APIs would be included. + +3. **Intra-Domain Decomposition**: The API service should be decomposed within the domain level, meaning +that only a subset of available APIs may need to be released and/or exposed at the API service level. + - **Example**: Within the EIM domain, we may not want to expose Day 2 functionality for some workflows, + even though Day 2 operations are part of the full EIM OpenAPI spec. + - This allows workflows focused on onboarding/provisioning to omit upgrade, maintenance, and troubleshooting APIs. + +4. **Resource-Level Decomposition**: The API service may also need to be decomposed at the individual internal service level. + - **Example**: Host resource might need different data models across use cases. + - **Note**: This would require separate data models and may increase complexity significantly. + +The following are the investigated options to decomposing or exposing subsets of APIs. + +- ~~API Gateway that would only expose certain endpoints to user~~ - this is a no go for us as we plan. +to remove the existing API Gateway and it does not actually solve the problem of releasing only specific flavours of EMF. +- Maintain multiple OpenAPI specification - while possible to create multiple OpenAPI specs, +the maintenance of same APIs across specs will be a large burden - still let's keep this option in consideration in +terms of auto generating multiple specs from top spec. +- ~~Authentication & Authorization Based Filtering~~ - this is a no go for us as we do not control the +end users of the EMF, and we want to provide tailored modular product for each workflow. +- ~~API Versioning strategy~~ - Creating different API versions for each use-case - too much overhead +without benefits similar to maintaining multiple OpenAPI specs. +- ~~Proxy/Middleware Layer~~ - Similar to API Gateway - does not fit our use cases. +- OpenAPI Spec Manipulation - This approach uses OpenAPI's extension mechanism (properties starting with x-) +to add metadata that describes which audiences, use cases, or clients should have access to specific endpoints, +operations, or schemas. This approach is worth investigating to see if it can give us the automated approach for +creating individual OpenAPI specs for workflows based on labels. +- Other approach to manipulate how a flavour of OpenAPIs spec can be generated from main spec, or how +the API service can be build conditionally using same spec. + +### Proposal: Decomposing the release of API service as a module + +This section describes how the apiv2 (NB API) service will be built, packaged, and released, +enabling scenario-specific variants: + +- The build of the API service itself will depend on the results of "top-to-bottom" +and "bottom-to-top" decomposition investigations. +- API subsets supported per scenario will be stored in the respective scenario manifest. +- `buf generate` will use only the proto files per services related to the scenario. +- Separate container images will be built per scenario, each supporting only +the required API subset and versioned accordingly: + - `apiv2:x.x.x` (full EMF) + - `apiv2:eim-x.x.x` (full EIM only) + - `apiv2:eim-vpro-x.x.x` (EIM only for vPRO) +- Single Helm chart for all scenarios will use a specific value to use scenario specific image +- Argo profiles can specify different scenarios (e.g., `orch-configs/profiles/eim-only-vpro.yaml` +sets `eimScenario: eim-only-vpro` set in deployment configuration) + +**Recommended Release Approach:** Build and release multiple apiv2 container images - one per scenario. +Single Helm chart for all scenarios will use a specific value to use scenario specific image. + +**Justification:** + +`buf generate` doesn't just create OpenAPI specs — it generates the entire +Go codebase (related to the APIs defined in the spec) including: + +- Go structs based on proto definitions +- gRPC client and server code +- HTTP gateway handlers (REST to gRPC) +- Type conversions and validators + +**Pros:** + +- ✅ Only compiles and includes needed services per scenario (smaller images) +- ✅ Explicit APIs subset per image +- ✅ Clear separation between scenarios +- ✅ Better security (unused code doesn't exist) +- ✅ Single Helm chart to maintain +- ✅ Image selection in Helm chart controlled by value that includes scenario name + +**Cons:** + +- Multiple images to build and maintain in CI/CD +- Need to rebuild all images for common code changes + +### Proposal: How to Build the EIM API Service per Scenario + +The Apiv2 service built per scenario will expose only the required APIs, +while preserving compatibility across scenarios. + +#### Restructure Proto Definitions + +Split the monolithic `services.proto` file into multiple folders/files per service: + +```bash + infra-core/apiv2/api/proto/services/ + ├── onboarding/ + │ └── v1/ + │ └── service1.proto + ├── provisioning/ + │ └── v1/ + │ └── service2.proto + ├── maintenance/ + │ └── v1/ + │ └── service3.proto + └── telemetry/ + └── v1/ + └── service4.proto +``` + +#### Define Scenario Manifests + +Maintain scenario manifests that list the REST API services supported by each scenario. +Scenario manifest files will be kept in `infra-core/apiv2`. The following are the examples of the manifests: + +```yaml + # scenarios/eim-only.yaml + name: eim-only + description: Only EIM + services: + - onboarding + - provisioning + - provisioning + - maintenance + - telemetry + + # scenarios/eim-vpro-only.yaml + name: eim-vpro + description: EIM vPRO Only + services: + - onboarding +``` + +**Why manifest files:** + +- Makefile-driven builds read the manifest to determine which services to compile. +- Version controlled in git repository. +- No database dependencies. + +#### Modify Build Process + +Modify **buf-gen** make target to read the manifests and build the openapi spec as per scenario manifest. +Example of **buf generate** command to generate code supporting onboarding and provisioning services: + +```bash + buf generate api/proto/services/onboarding/v1 api/proto/services/provisioning/v1 +``` + +The generated `openapi.yaml` file will contain only the services supported by the particular scenario. +The output file can be named per scenario. The build will also generate the corresponding Go types, +gRPC gateway code, and handlers for those APIs. An image will be built per scenario and pushed seperately. + +## Consuming the Scenario Specific APIs from the CLI + +### Proposal + +The best approach would be for the EMF to provide a service that communicates which endpoints/APIs are +currently supported by the deployed API service. +Proposed in [Design Proposal: Orchestrator Component Status Service](https://github.com/open-edge-platform/edge-manageability-framework/blob/main/design-proposals/platform-component-status-service.md). +Development of such service is outside of this ADR's scope. + +#### CLI Workflow + +1. **Build**: CLI is built based on the full REST API spec. +2. **Capability Discovery on Login**: The CLI queries the new capabilities service endpoint, upon user login, +to request API capability information. +3. **Configuration Caching**: The CLI saves the supported API configuration locally. +4. **Command Validation**: Before executing commands, the CLI checks the cached configuration and executes +only the commands supported by the currently deployed scenario. +5. **Error Handling**: + - For CLI commands: Display user-friendly error message. + - For direct curl calls: API returns HTTP 404 (endpoint not found) or 501 (HTTP method not implemented). + +#### CLI Login Command Flow + +```bash + ┌─────────────────┐ + │ User runs │ + │ orch-cli login │ + └────────┬────────┘ + │ + ▼ + ┌─────────────────────────┐ + │ GET /../capabilities│ ← Example of the new service endpoint + └────────┬────────────────┘ + │ + ▼ + ┌──────────────────────────┐ + │ Response: │ + │ { │ + │ "scenario": "eim-vpro",│ + │ "apis": [ ← Example of the new service responce + │ "onboarding", │ + │ "provisioning" │ + │ ] │ + │ } │ + └────────┬─────────────────┘ + │ + ▼ + ┌─────────────────────────┐ + │ CLI caches config │ + │ in ~/.orch-cli/config │ + └─────────────────────────┘ + ``` + +## Summary of Action Items + +### 1. Traefik Gateway Compatibility + +- Traefik gateway will be removed for all workflows. User API calls will access EIM internal enpoints directly. +- Investigate the impact + +### 2. Scenario Definition and API Mapping + +- Define all supported scenarios: + - Full EMF + - EIM-only + - EIM-only vPRO +- For each scenario, document: + - Required services (which resource managers are needed) + - Required API endpoints (which operations are exposed) + - Deployment configuration (Helm values, profiles) + +### 3. Data Model Changes + +- Collaborate with teams/ADR owners to establish (per scenario): + - Required changes at Inventory level + - Impact on APIs from these changes (changes in data models) + +## Summary of Current Requirements + +- Provide scenario-based EIM API sets (full and subsets). +- Preserve APIs compatibility with Inventory. +- Deliver per-scenario OpenAPI specs and container images. +- Maintain single source of truth for API definitions with automated generation of scenario specific API specs. +- Keep CLI operable against any scenario via discovery, caching, and command validation. +- Provide error handling for missing APIs per scenario. +- Support Helm-driven configuration (image/tag, scenario selection). +- Support API selection per scenario through Mage/ArgoCD. + +## Rationale + +The approach aims to narrow the operational APIs surface to the specific scenarios being targeted, +while ensuring the full EMF remains available for deployments. +The proposed solution to APIs decomposition enables incremental decomposition that can be adopted +progressively without breaking existing integrations or workflows. + +## Investigation Needed + +The following investigation tasks will drive validation of the decomposition approach: + +1. Validate feasibility of splitting services.proto and generating per-scenario specs via buf/protoc-gen-connect-openapi. +2. Evaluate Inventory data model variations per scenario. +3. Verify impact of **1** and **2** on gRPC gateway generation and handler registration per scenario (buf code generation). +4. Validate Argo CD application configs or Mage targets for scenario-specific deployments. + +## Implementation Plan for Orch CLI + +1. Add login-time scenario discovery: retrieve scenario supporetd APIs from the new service. +2. Cache discovered capabilities in orch-cli config. +3. Validate user commands against supported APIs +4. Implement error handling for unsupported APIs. +5. Adjust help to hide unsupported commands/options. +6. Define E2E tests targeting all scenarios. + +## Implementation Plan for EIM API + +1. Restructure Proto Files + - Split monolithic `services.proto` into service-scoped folders + (e.g.: onboarding, provisioning, maintenance, telemetry) + - Each service in its own directory: `api/proto/services//v1/.proto` + +2. Create Scenario Manifests + - Add `scenarios/` directory with YAML files for each scenario + - Define service lists per scenario + +3. Update Makefile Build Process + - Modify `buf-gen` target to read scenario manifest and generate only specified services. + - Modify Makefile to allow building per-scenario images + - Add `docker-build-all` target to build images for all scenarios. + - Modify image naming convention. + +4. Update Helm Chart + - Use single, common Helm chart for all scenarios + - Add a new value to select which scenario image to deploy (e.g.: `image.tag`) + +5. ArgoCD Integration + +6. CI/CD Pipeline + - Build all scenario images in CI + - Tag with both scenario name and version + - Push all images to registry + +## Test plan + +Tests will verify that minimal and full deployments work as expected, that clients can discover +supported features, and that errors are clear. + +- CLI integration with new service: CLI can discover supported services; absence returns descriptive messages. +- CLI E2E: Login discovery, caching, command blocking, error messaging. +- Deployment E2E: Deploy each scenario via mage and verify that expected endpoints exist and work. +- Regression: Verify the full EMF scenario behaves identically to pre-decomposition. + +## Open Issues + +- Traefik gateway removal and impacts. +- What happens when the service does not exist and CLI expects it to exist?. +- Detailed scenario definitions on the Inventory level - NB APIs should be alligned +with the Inventory resource availability in each scenario. +- Managing apiv2 image version used by infra-core argo application - deployment level. +- Scenario deployment through argocd/mage +- What will be the Image naming convention (per scenario)? +(example: `apiv2-:` or `apiv2:-`) diff --git a/design-proposals/eim-pxe-with-managed-emf.md b/design-proposals/eim-pxe-with-managed-emf.md index 9bcbe9beb..40afd304e 100644 --- a/design-proposals/eim-pxe-with-managed-emf.md +++ b/design-proposals/eim-pxe-with-managed-emf.md @@ -30,7 +30,8 @@ Given its small footprint it is possible to deploy PXE server on site using seve In this solution, the PXE server only stores the `ipxe.efi` binary (that is downloaded from the remote orchestrator), and serves it to local Edge Nodes attempting the PXE boot. During the PXE boot, ENs download `ipxe.efi` and boot into it. -The iPXE script includes a logic to fetch IP address from a local DHCP server and download Micro-OS from the remote EMF orchestrator. +The iPXE script includes a logic to fetch IP address from a local DHCP server and download Micro-OS +from the remote EMF orchestrator. Once booted into Micro-OS, the provisioning process is taken over by the cloud-based orchestrator. From now on, ENs communicate with the remote EMF orchestrator to complete OS provisioning. The secure channel is ensured by using HTTPS communication with JWT authorization. diff --git a/design-proposals/platform-installer-simplification.md b/design-proposals/platform-installer-simplification.md index 0f3f4f9a2..c1bdd1508 100644 --- a/design-proposals/platform-installer-simplification.md +++ b/design-proposals/platform-installer-simplification.md @@ -449,9 +449,9 @@ deployment is delayed. #### Eliminate ArgoCD -Once the syncwaves have been reduced or eliminated, then it is feasible to eliminate ArgoCD in favor of a simpler tool. We -will explore alternatives such as umbrella charts, the helmfile tool, or other opensource solutions. We may explore repo -and/or chart consolidation to make the helm chart structure simpler. +Once the syncwaves have been reduced or eliminated, then it is feasible to eliminate ArgoCD in favor of a +simpler tool. We will explore alternatives such as umbrella charts, the helmfile tool, or other opensource +solutions. We may explore repo and/or chart consolidation to make the helm chart structure simpler. Eliminating argocd will allow the following pods to be eliminated from the platform: diff --git a/design-proposals/vpro-device.md b/design-proposals/vpro-device.md index 8c22912f4..04eb05482 100644 --- a/design-proposals/vpro-device.md +++ b/design-proposals/vpro-device.md @@ -313,4 +313,5 @@ had access to DMT capabilities. This provided critical recovery mechanisms including the ability to remotely reboot the device if provisioning got stuck, access the device out-of-band for troubleshooting, and recover from provisioning failures without requiring physical access to the device. -By moving activation to post-OS deployment, we lose all these recovery capabilities during the critical OS provisioning phase. +By moving activation to post-OS deployment, we lose all these recovery capabilities during the critical +OS provisioning phase.