diff --git a/docs/INTEGRATING_AGENTS.md b/docs/INTEGRATING_AGENTS.md index 5b7e70f1c2..4bf44ed48f 100644 --- a/docs/INTEGRATING_AGENTS.md +++ b/docs/INTEGRATING_AGENTS.md @@ -336,6 +336,9 @@ Note that a Package version. Can be: - A digest (`@sha256:...`) - Both tag and digest (`:v1.0.0@sha256:...`), when both are specified the digest takes precedence. +> [!WARNING] +> The package in the OCI repository **MUST** follow a specific [structure](./package_manager.md#package-structure). + **Accessing Package Contents:** After installation, the package directory path is available via the reserved variable `${nr-sub:packages..dir}`, where `` is the key used in the packages map. diff --git a/docs/oci_repository.md b/docs/oci_repository.md deleted file mode 100644 index 0ac6101abd..0000000000 --- a/docs/oci_repository.md +++ /dev/null @@ -1,66 +0,0 @@ -# OCI Repository and AgentControl - -## Overview -AgentControl manages agent packages (and in the future agentTypes) distributed as OCI (Open Container Initiative) artifacts. -The package manager handles downloading, extracting, installing packages from OCI registries. - -Package references are constructed from three components: -- Registry URL (e.g., `registry.example.com`) -- Repository path (e.g., `agents/my-agent`) -- Version (optional): Can be a tag (`:v1.0.0`), a digest (`@sha256:...`), when both are specified the digest takes precedence. - -This data is taken from the Packages section of the [AgentType configuration](./INTEGRATING_AGENTS.md). - -## Package Installation Process - -When an agent needs to install or update a package, the package manager leverages the following paths: -``` -temp_package_path: /packages//__temp_packages// -final_path: /packages//stored_packages// -``` - -The `final_path` location is where the extracted package will reside after installation and can be referenced -by the agent through the variable `${nr-sub:packages.infra-agent.dir}`. - -**Steps**: -1. Create temporary download directory -2. Download artifact (expects exactly 1 layer/file), if the file was already downloaded, skip download -3. Create final installation directory -4. Extract archive based on `PackageType` (tar.gz or zip) derived from the mime type -5. Delete temporary directory (always, even on failure) - -Currently, the whole operation blocks the sub-agent thread until it terminates. -Notice that the old subAgent (and therefore the binary) is stopped before the new one is downloaded and executed. -In the next iterations, we will have a non-blocking implementation to avoid the subAgent to be blocked by this operation. - -## Error Handling - -**Installation Failures**: -- Download errors → Retry if configured, then fail -- Invalid artifact (not exactly 1 file) → Fail with `InvalidData` -- Extraction errors → Delete partial installation directory, fail -- Temp cleanup errors → Installation fails - - -## Local Development -When developing and debugging locally, you can use a local OCI registry. You can run it using zot: -```bash -$ ./tools/oci-registry.sh run -``` - -Notice that AC is already configured to use HTTP as protocol when connecting to `localhost:5001` if executed/built __without__ `--release`. - -## Installation Process -Currently, there is no installation step or script execution, just extraction. -We expect to support installation scripts in the future. TODO - -## Signature Verification -TODO @danielorihuela - -## Garbage collection -TODO not implemented yet - -## Agent Types Management -TODO not implemented yet - - diff --git a/docs/package_manager.md b/docs/package_manager.md new file mode 100644 index 0000000000..db1e13b088 --- /dev/null +++ b/docs/package_manager.md @@ -0,0 +1,260 @@ +# Package Manager + +Agent Control (AC) manages agent packages (and in the future agent types) distributed as OCI ([Open Container Initiative](https://opencontainers.org/)) artifacts. +The package manager handles download, signature verification, extraction, installation and garbage collection of packages from OCI registries. + +## Package structure (in the OCI repository) + +The packaged agent must comply with the [OCI image spec](https://github.com/opencontainers/image-spec). The entrypoint can either +be a [manifest](https://github.com/opencontainers/image-spec/blob/main/manifest.md#oci-image-manifest-specification) JSON file or an [index](https://github.com/opencontainers/image-spec/blob/main/image-index.md#oci-image-index-specification) JSON file, but for AC we expect the entrypoint to be an `index`. That's the way OCI supports multi-arch. + +> [!NOTE] +> The OCI spec has some [rough edges, at least around the multi-arch topic](https://github.com/oras-project/oras/issues/1538). + +The `index` file includes a list of manifest files. AC expects: + +* each manifest entry in the array to have a `platform` section with the `architecture` and `os` fields + + Check fields and possible values in the [Image Index Property Descriptions section of the spec](https://github.com/opencontainers/image-spec/blob/main/image-index.md#image-index-property-descriptions). However, to know if AC supports a specific architecture and os, we must look at the [oci_client crate support](https://docs.rs/oci-client/latest/oci_client/manifest/struct.Platform.html) and the [AC compatibility section](https://docs.newrelic.com/docs/new-relic-control/agent-control/overview/#requirements-and-compatibility). If it appears on both, then it's supported. + + The section is automatically handled by the [oci_client](https://github.com/oras-project/rust-oci-client/blob/4541487ec759eef2a511e385d95e72e78aa37f69/src/client.rs#L974), which automatically detects the platform and retrieves the correct manifest from the index. + +> [!NOTE] +> The `artifactType` field in a given manifest entry of the `index` is not verified by AC, nor by the [client](https://docs.rs/oci-client/latest/oci_client/manifest/struct.ImageIndexEntry.html). + +Other than that, it must follow the OCI spec. + +Example: + +```json +{ + "schemaVersion": 2, + "mediaType": "application/vnd.oci.image.index.v1+json", + "manifests": [ + { + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "digest": "sha256:82677ba32d1276debe264d14ec5f7b1c61e2a9acbc8c6a6dff779d7133ec8487", + "size": 617, + "platform": { + "architecture": "amd64", + "os": "linux" + }, + "artifactType": "application/vnd.newrelic.agent.v1" + }, + { + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "digest": "sha256:5a16021a5101f7ae0583cddae44ea715ad2cfd618b61b8982de1b847958260da", + "size": 617, + "platform": { + "architecture": "arm64", + "os": "linux" + }, + "artifactType": "application/vnd.newrelic.agent.v1" + }, + { + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "digest": "sha256:13e6d06647bbaf4f44d4c29bb57e1078c9919da92e2aee3443c122c24b86d3cb", + "size": 502, + "platform": { + "architecture": "amd64", + "os": "windows" + }, + "artifactType": "application/vnd.newrelic.agent.v1" + } + ] +} +``` + +Then we have the `manifest` file, which includes the package data and metadata. + +> [!NOTE] +> AC can handle artifacts regardless of the entrypoint (manifest or index). +> However, AC suggests using the index approach. If the author of an agent decides to use a manifest as the entrypoint, they must understand that AC might behave in strange ways. + +There, AC expects to find some specific keys and values. + +* `config` must exist, have value `application/vnd.oci.image.config.v1+json` and contain the `architecture` and `os` [properties](https://github.com/opencontainers/image-spec/blob/main/config.md#properties) + + ```json + { + "architecture": "amd64", + "os": "linux" + } + ``` + + This is not required by AC itself. However, it adds support for external tooling. For example, [oras](https://github.com/oras-project/oras). + +* `artifactType` must exist and have value `application/vnd.newrelic.agent.v1` + +* `layers/mediaType` must take one of the following values: + + - `application/vnd.newrelic.agent.content.v1.zip` + - `application/vnd.newrelic.agent.content.v1.tar+gzip` + - `application/vnd.newrelic.agent-type.content.v1.tar+gzip` + +Example: + +```json +{ + "schemaVersion": 2, + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "artifactType": "application/vnd.newrelic.agent.v1", + "config": { + "mediaType": "application/vnd.oci.image.config.v1+json", + "digest": "sha256:7758599fc4d06bd93a65bf28bc98fbff6c559a9a56be1ec3d75ff6aa8a8cfe6e", + "size": 39 + }, + "layers": [ + { + "mediaType": "application/vnd.newrelic.agent.content.v1.zip", + "digest": "sha256:2e2e87f3a9403e735bee76c166b7139be36c1a76079f786e21ab2ce138cd9a1a", + "size": 21678636, + "annotations": { + "com.newrelic.artifact.type": "package", + "org.opencontainers.image.title": "newrelic-infra-amd64.zip", + "org.opencontainers.image.version": "1.71.3" + } + } + ], + "annotations": { + "org.opencontainers.image.created": "2026-01-23T08:07:06Z" + } +} +``` + +## Package references + +Package references are constructed from the data configured in the [AgentType `packages` section](./INTEGRATING_AGENTS.md#packages). + +## Package Installation Process + +When an agent needs to install or update a package, the package manager leverages the following paths: +``` +temp_package_path: /packages//__temp_packages// +final_path: /packages//stored_packages// +``` + +The `final_path` location is where the extracted package will reside after installation and can be referenced +by the agent through the variable `${nr-sub:packages.infra-agent.dir}`. + +**Steps**: +1. Create temporary download directory +2. Download artifact (expects exactly 1 layer/file), if the file was already downloaded, skip download +3. Create final installation directory +4. Extract archive based on `PackageType` (tar.gz or zip) derived from the MIME type +5. Delete temporary directory (always, even on failure) + +The whole operation blocks the sub-agent thread until it terminates. +Notice that the old sub-agent (and therefore the binary) is stopped before the new one is downloaded and executed. +In the next iterations, we will have a non-blocking implementation to avoid the sub-agent being blocked by this operation. + +There is no installation step or script execution, just extraction. We expect to support installation scripts in the future. +TODO: not implemented yet + +## Signature Verification + +AC supports signature verifications. This assures users that a given agent was uploaded by the expected author and it hasn't been tampered. + +The signature in the OCI repository follows the [Simple Signing format](https://github.com/sigstore/cosign/blob/main/specs/SIGNATURE_SPEC.md#payloads) (the only supported format) and it's been created with the [external tool process](https://docs.sigstore.dev/cosign/signing/signing_with_containers/#sign-and-upload-a-generated-payload-in-another-format-from-another-tool). + +> [!NOTE] +> New Relic uses an owned repository. It doesn't need extra-services like [Rekor](https://docs.sigstore.dev/logging/overview/) or [Fulcio](https://docs.sigstore.dev/certificate_authority/overview/). That's the reason why Agent Control uses the external tool process instead of `cosign sign`. + +As a result of the "external tool process", the OCI repository will contain two packages. One for the agent and one for the signature. The signature package contains, among other things, the payload that was signed (in json format) and it's signature in base64. Inside the payload, we find the hash of the signed agent package. This is enough to verify the signature of an artifact, as we will see later. + +Public keys are **ALWAYS** downloaded when verifying a signature. This avoids problems with outdated caches, like using a revoked key. +Public keys **MUST** be in JWKS format. + +```json +{ + "keys": [ + { + "kty": "OKP", + "alg": null, + "use": "sig", + "kid": "key/0", + "n": null, + "e": null, + "x": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", + "y": null, + "crv": "Ed25519" + }, + { + "kty": "OKP", + "alg": null, + "use": "sig", + "kid": "key/1", + "n": null, + "e": null, + "x": "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", + "y": null, + "crv": "Ed25519" + } + ] +} +``` + +Verification flow: + +```mermaid +sequenceDiagram + participant AC as Agent Control + participant OCI as OCI Repository + participant JWKS as PublicKey server + + Note over AC, JWKS: Step 1: Check artifact signature + AC->>JWKS: Download public keys + JWKS-->>AC: Public keys in JWKS format + AC->>OCI: Download Signature Package + OCI-->>AC: Base64 Signature & Payload + AC->>AC: Verify base64 signature against the payload (holds for at least one public key) + AC->>AC: Extract artifact hash from payload + + Note over AC, OCI: Step 2: Download signed artifact + AC->>OCI: Download artifact by hash + OCI-->>AC: Artifact + + Note over AC: Step 3: Integrity Check + AC->>AC: Compute downloaded artifact hash + AC->>AC: Compare computed hash against the hash inside the payload + + Note over AC: Result: Package Verified & Trusted +``` + +## Key Rotation + +What happens during a key rotation? It depends on the specific use case. AC tries to verify the signature with every public key published for that package, this approach avoids downtimes on key rotation. Note that a non-revoked key must exist on the list, otherwise signature verification will fail. + +## Garbage collection + +AC keeps track of the latest installed package. Each install operation executes an old package purge operation, which retains the latest tracked package (i.e. package currently running) and the new installed package. You can think of it like a FIFO with size 2. + +On restart, AC only retains the current package. The previous package will be deleted. + +Example: + +1. User installs infra agent version 1.0.0 (system stores infra 1.0.0) +2. User installs infra agent version 3.0.0 (system stores infra 1.0.0 and 3.0.0) +3. User installs infra agent version 2.0.0 (system stores infra 2.0.0 and 3.0.0) +4. Restart (system stores infra 2.0.0) + +## Error Handling + +**Installation Failures**: +- Download errors → Retry if configured, then fail +- Invalid artifact (not exactly 1 file) → Fail with `InvalidData` +- Extraction errors → Delete partial installation directory, fail +- Temp cleanup errors → Installation fails + +## Local Development + +If needed, you can run a local OCI registry using [zot](https://github.com/project-zot/zot) with + +```bash +$ ./tools/oci-registry.sh run +``` + +Notice that AC is already configured to use HTTP as protocol when connecting to `localhost:5001` if executed/built __without__ `--release`. + +## Agent Types Management +TODO not implemented yet