AC manages a workload by reading a description of how to run, configure, manage and check the health of it, in the form of a YAML file. This file describes what we call an agent type definition. In some places of the codebase, we might refer to the workload created for a certain agent type as an agent type instance.
A set of agent type definitions is shipped built into AC, but the AC team or external teams can also add new supported agents without rebuilding AC. See Where agent type definitions come from for the sources AC resolves definitions from and their precedence.
Each agent type definition targets a single (platform, operating_system) pair. The platform is either host or kubernetes; operating_system is required when platform: host (linux or windows) and must not be set when platform: kubernetes. An agent that supports more than one such pair (for example, the Infrastructure Agent which runs on host Linux, host Windows and Kubernetes) is defined by one YAML file per pair, all sharing the same namespace, name and version. At startup, Agent Control loads only the definitions whose platform (and operating_system, when platform: host) match the binary it's running in.
The definition for an agent type consists on a single YAML file with three main areas defined below.
We recommend that you read the following sections, but at any time feel free to check the currently available definitions in its dedicated docs to see working examples of the explained concepts.
Contains top-level fields for the name of the agent type, with a namespace, the version, and the platform this definition targets.
The version used here is not the version of your agent, but the version of the agent type definition. For example, at the time of writing this we may use version: 0.1.0 for our Infrastructure Agent definition, but the version of the actual Infrastructure Agent binary that AC ends up running as sub-agent would be the most recent one (1.60.1).
Agent Types are versioned to ensure compatibility with a given configuration values (no breaking changes, see below). As of now, we maintain only one version per agent type and use a fixed 0.1.0 value for it because these definitions are not easily visible to FC, but FC needs to know what are the agent types and their versions to make the metadata visible on New Relic's UI. As of now we prohibit pushing breaking changes for these definitions, and any exceptions to this need to be validated at least by both AC and FC teams.
Separately from the agent type version, every definition must declare a top-level protocol_version. This is not a metadata field — it is parsed on its own and versions the agent-type schema language itself: the set of fields and their meaning that Agent Control knows how to parse, including the shape of the metadata block described here. It is decoupled from both the agent type version (semver) and the Agent Control release version. It is a quoted MAJOR.MINOR string (for example "1.0"); the value must be quoted, otherwise YAML interprets 0.1 as a float and the field is rejected.
Because it gates the rest of the document, Agent Control reads and validates protocol_version first, at the registry ingestion boundary, before the metadata and the other sections are interpreted. Each Agent Control release understands a single maximum protocol version, and the protocol_version is treated as a single ordered MAJOR.MINOR value. The compatibility rules are:
- Newer than supported (higher
major, or samemajorwith a higherminor): rejected. The file is newer than this Agent Control understands. - Equal to or older than supported: accepted. Agent Control understands every protocol version up to and including the supported one.
For example, an Agent Control that supports protocol version 1.6 accepts everything up to 1.6 (including 0.9 and 1.0..=1.6) and rejects anything newer (1.7, 2.0, ...).
The platform field is required, and operating_system is required when platform: host. The supported combinations are:
platform: kubernetes(nooperating_system).platform: hostwithoperating_system: linux.platform: hostwithoperating_system: windows.
Any other combination (for example platform: host without an OS, or platform: kubernetes with one) is rejected at parse time.
This is an example section for the metadata fields, using the Kubernetes definition of the New Relic Infrastructure Agent.
namespace: newrelic
name: com.newrelic.infrastructure
version: 0.1.0
protocol_version: "1.0"
platform: kubernetes
# ...The Linux and Windows host variants share the same namespace/name/version and platform: host, and differ only in the operating_system value:
namespace: newrelic
name: com.newrelic.infrastructure
version: 0.1.0
protocol_version: "1.0"
platform: host
operating_system: linux
# ...This section, defined under the top-level field variables, enables the dynamic configuration of the workload created by AC by exposing arbitrary variables. Variables are declared as a flat tree directly under variables — there are no per-platform sub-keys. If an agent type supports multiple platforms, each per-platform YAML file declares its own variables independently (they may overlap or differ between platforms).
Variables can be arbitrarily grouped into common fields forming a tree, where the final leaf will determine the actual variable, its type and its allowed contents.
Defining variables is entirely optional, but if no variables are defined then no dynamic configuration will be possible for this sub-agent, AC will be only capable of adding or removing it as a workload using its deployment instructions and at most the environment variables available to AC at the time it's running (see the deployment section below).
The following is a section of the defined configuration variables for the Kubernetes definition of the New Relic Infrastructure Agent. You can read a detailed explanation below.
variables:
chart_values:
newrelic-infrastructure:
description: "newrelic-infrastructure chart values"
type: yaml
required: false
default: {}
nri-metadata-injection:
description: "nri-metadata-injection chart values"
type: yaml
required: false
default: {}
global:
description: "Global chart values"
type: yaml
required: false
default: {}
chart_version:
description: "nri-bundle chart version"
type: string
required: trueHere, chart_values is a grouping field that contains three nested variables (newrelic-infrastructure, nri-metadata-injection and global), while chart_version is a sibling top-level variable.
When referencing these variables elsewhere, as you will see in the deployment and applying configuration sections, you would access these nested fields using a dot (.), as usual for accessing fields in programming languages. For our example, we would use chart_values.newrelic-infrastructure, chart_values.nri-metadata-injection, chart_values.global and chart_version respectively.
The variables can theoretically be nested this way indefinitely, but for usability purposes we advise to keep this at a reasonable level.
For the leaf nodes of the variable definitions, we currently support the following fields:
A description of the variable, for documentation purposes.
The value type that is accepted for this variable. As of now, the following types are supported (using the allowed values for the field):
string.bool.number: Integer or floating point are supported.yaml: An arbitrary YAML value, like an array, an object or even a scalar.map[string]yaml: A YAML value where the top-level is guaranteed to consist on string keys for other values.
Specifies if providing a value for this variable is required or not. If required is false, a default value of its specified type needs to be provided. If required is true, then a default value cannot be specified.
A default value for this variable, for the cases where no configuration value has been passed for this variable when creating an instance for the agent type. Its value must be of the same type as the one declared for the variable.
In the case of the yaml variable type, is recommended to explicitly set a 'null' default value as default: null.
Only available for String variables.
A list of accepted values for this variable. If any configuration includes a value for this variable that is not among the specified variants, the configuration will be invalid. The accepted values can be changed in the Agent Control configuration, as in the example below:
Agent type:
my_variable:
# ...
type: string
variants:
ac_config_field: "my_variable_variants" # If the field is set in `agent_type_var_constraints.variants`, the configures values will be used instead of the default ones.
values: ["value1", "value2"] # Otherwise the values defined here are usedAC config:
agent_type_var_constraints:
variants: # map of variants
my_variable_variants: ["supported_value1", "supported_value2"] # The key should match what is defined in the Agent TypeBy default, no variants are set, resulting in no variant validation.
This actually defines how the workload will be created and managed by AC, and it's defined under the top-level field deployment. The shape of deployment depends on the platform declared in the metadata: an on-host definition uses on-host deployment fields (executables, filesystem, packages, …), and a Kubernetes definition uses Kubernetes deployment fields (objects, …). Each per-platform YAML file describes a single deployment block.
The deployment field is required and cannot be empty.
These instructions can be dynamically rendered using as inputs the values for the variables exposed above, environment variables and other internal information exposed by AC. To reference any of these contents we use a template syntax with the form ${<NAMESPACE>:<VARIABLE_REF>}. The NAMESPACE section can have the following values and determines what the VARIABLE_REF section represents:
nr-var: a variable exposed as in the previous section. If you defined a variable calledconfigs.some_togglethen you can reference it inside thedeploymentsection as${nr-var:configs.some_toggle}.nr-env: environment variables. So, if AC started running with an env var calledMY_ENVdefined, it can be used inside thedeploymentsection with${nr-env:MY_ENV}.nr-sub: metadata variables related to the current workload populated automatically by AC. As of now, only the variableagent_idis exposed, which is a unique, human-friendly identifier of the current workload.nr-ac: global metadata used by AC (see Global metadata list).
When talking about the variables that were defined in the variables field for an agent type definition, that can be used as local or received as remote configuration for an agent type instance, we will often use the term configuration values or just values.
All of these variable references will be replaced with actual values, either provided with configuration values or their defaults if missing and the variables are not required, on a rendering stage that will create the final instructions for the deployment.
When adding these values as a user, either as a local config for AC (a file in the filesystem for on-host or a ConfigMap for Kubernetes) or as remote configs made available from FC, the format used is a YAML file with the values following the same tree-like structure defined for the variables in the agent type definition, but the leaf nodes being the actual values.
For examples of this with actual agent type definitions, see kubernetes config examples and host config examples in the official New Relic documentation site.
Any AC environment variable can be referenced within local or remote configuration values using the ${nr-env:<ENVIRONMENT_VARIABLE>} syntax. During the rendering process, AC will resolve the ENVIRONMENT_VARIABLE and replace the placeholder with its corresponding value.
For example, consider the following configuration snippet:
config_agent:
license_key: ${nr-env:LICENSE_KEY}In this case, AC will look for an environment variable named LICENSE_KEY and substitute its value into the configuration.
It is important to note that the availability of environment variables depends on the environment where AC is running:
- On-host installations: AC will have access to environment variables configured at the systemd service level. Ensure that any required variables are properly defined in the service configuration.
- Kubernetes deployments: AC will have access to environment variables attached to the AC Pod. These variables can be defined in the Pod's manifest, typically under the
envsection.
By leveraging this mechanism, you can dynamically inject environment-specific values into your configurations, simplifying deployment and ensuring flexibility across different environments.
The following examples show the deployment block for the Linux, Windows and Kubernetes definitions of the New Relic Infrastructure Agent — each in its own per-platform YAML file.
Linux (platform: host, operating_system: linux):
deployment:
enable_file_logging: ${nr-var:enable_file_logging}
health:
interval: 5s
initial_delay: 5s
timeout: 5s
http:
path: "/v1/status/health"
port: ${nr-var:health_port}
packages:
infra-agent:
download:
oci:
repository: ${nr-var:oci.repository}
version: ${nr-var:version}
filesystem:
config:
kind: dir
entries:
newrelic-infra.yaml:
kind: file
text: |
${nr-var:config_agent}
integrations.d:
kind: dir_content_from_map
source: ${nr-var:config_integrations}
logging.d:
kind: dir_content_from_map
source: ${nr-var:config_logging}
executables:
- id: newrelic-infra
path: ${nr-sub:packages.infra-agent.dir}/newrelic-infra
args:
- --config
- ${nr-sub:filesystem_agent_dir}/config/newrelic-infra.yaml
env:
NRIA_PLUGIN_DIR: "${nr-sub:filesystem_agent_dir}/integrations.d"
NRIA_LOGGING_CONFIGS_DIR: "${nr-sub:filesystem_agent_dir}/logging.d"
NRIA_STATUS_SERVER_ENABLED: true
NRIA_STATUS_SERVER_PORT: "${nr-var:health_port}"
NR_HOST_ID: "${nr-ac:host_id}"
restart_policy:
backoff_strategy:
type: fixed
backoff_delay: ${nr-var:backoff_delay}Windows (platform: host, operating_system: windows):
deployment:
enable_file_logging: ${nr-var:enable_file_logging}
health:
interval: 5s
initial_delay: 5s
timeout: 5s
http:
path: "/v1/status/health"
port: ${nr-var:health_port}
packages:
infra-agent:
download:
oci:
repository: ${nr-var:oci.repository}
version: ${nr-var:version}
filesystem:
config:
kind: dir
entries:
newrelic-infra.yaml:
kind: file
text: |
${nr-var:config_agent}
integrations.d:
kind: dir_content_from_map
source: ${nr-var:config_integrations}
logging.d:
kind: dir_content_from_map
source: ${nr-var:config_logging}
executables:
- id: newrelic-infra
path: ${nr-sub:packages.infra-agent.dir}\\newrelic-infra.exe
args:
- --config
- ${nr-sub:filesystem_agent_dir}\\config\\newrelic-infra.yaml
env:
NRIA_PLUGIN_DIR: "${nr-sub:filesystem_agent_dir}\\integrations.d"
NRIA_LOGGING_CONFIGS_DIR: "${nr-sub:filesystem_agent_dir}\\logging.d"
NRIA_STATUS_SERVER_ENABLED: true
NRIA_STATUS_SERVER_PORT: "${nr-var:health_port}"
NR_HOST_ID: "${nr-ac:host_id}"
restart_policy:
backoff_strategy:
type: fixed
backoff_delay: ${nr-var:backoff_delay}Kubernetes (platform: kubernetes):
deployment:
health:
interval: 30s
initial_delay: 30s
checks:
- namespace: ${nr-ac:namespace}
name: ${nr-sub:agent_id}
kind: HelmReleaseWorkload
target_namespace: ${nr-ac:namespace_agents}
objects:
release:
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: ${nr-sub:agent_id}
namespace: ${nr-ac:namespace}
spec:
targetNamespace: ${nr-ac:namespace_agents}
releaseName: ${nr-sub:agent_id}
interval: 3m
# ... omitted for brevity
values:
newrelic-infrastructure: ${nr-var:chart_values.newrelic-infrastructure}
nri-metadata-injection: ${nr-var:chart_values.nri-metadata-injection}
kube-state-metrics: ${nr-var:chart_values.kube-state-metrics}
nri-kube-events: ${nr-var:chart_values.nri-kube-events}
global: ${nr-var:chart_values.global}We have some global metadata available both for on-host and k8s. Be aware that the metadata are different.
For on-host, we have:
host_id: contains an identifier calculated from the retrieved information about the host, such as the hostname or cloud-related data (when available).filesystem_agent_dir: contains the absolute path to a dedicated file system directory for this sub-agent. The default value in Linux systems is/var/lib/newrelic_agent_control/filesystem/<AGENT_ID>. Note how the agent type definition uses this variable for content added via thefilesystemfield (see below).
For k8s, we have:
namespace: the namespace where Agent Control and Flux will be created.namespace_agents: the namespace where sub-agents will be created. Due to a limitation in thek8s-agents-operator, Instrumentation CRs are created in this namespace too.
The following fields are used for configuring the on-host deployment of a sub-agent.
Instructions to actually run the sub-agent process. It is composed of the following fields:
path: Full path to the executable binary. A string.args: Command line arguments passed to the executable. This is an array of string.env: A key-value mapping of environment variables and their respective values. Strings.restart_policy: How the sub-agent should behave if it ends execution. If this policy limits are exceeded the sub-agent will be marked as unhealthy (see Health status below) and not restarted anymore. Accepts the following fields:backoff_strategy: Timing-related configuration for the restart, to prevent wasteful crash-loops. Accepts the following values:type: eitherfixed,linearorexponential.backoff_delay: Time between restarts. This is a time string in the form of10s,1h, etc.max_retries: Maximum number of restart tries. A number.last_retry_interval: Time interval for the back-off number of retries to maintain its number. That is, if the process spends more than this interval after the restart policy was triggered, the restart policy values like the current tries or the back-off delays will be reset. This is a time string in the form of10s,1h, etc.
As of now, the executables field is array and is actually optional. This was intended to cover the APM agents use case for on-host, in which the agents are not processes but libraries or plugins injected to other processes, customer applications, whose lifecycle AC must not manage (see Agent-less supervisors below). However, this is not yet supported. An agent without executables is accepted as valid, but AC will just spawn an internal supervisor structure for the sub-agent without actually doing anything besides checking health, if it was configured.
Represents the file system configuration for the deployment of a host agent. Consisting of a set of directories (map keys) which in turn contain a set of files (nested map keys) with their respective content (map values).
The contents defined here will be written to the sub-agent's dedicated directory for filesystem
files, which can be referenced in other fields via the variable ${nr-sub:filesystem_agent_dir}.
The files can be hardcoded, with the contents possibly containing templates, or the whole set of
files can be templated, so a directory contains an arbitrary number of files (a place to use a
map[string]yaml variable type). The paths cannot be templated individually.
Every directory and every file is declared with a kind, and directory trees are built recursively via an entries: field. A directory's contents can also be templated from a map[string]yaml variable using kind: dir_content_from_map, the map's keys become filenames and the values become file contents.
Each key names a single entry at its own level — it must be a single path segment (a leaf), not a slash-separated sub-path. A nested directory has to be spelled out level by level with explicit kind: dir + entries: blocks; a key such as newrelic-infra/newrelic-integrations/logging is rejected. Declare it as:
newrelic-infra:
kind: dir
entries:
newrelic-integrations:
kind: dir
entries:
logging:
kind: dirThis applies to projected filenames too: the keys of a map[string]yaml used by dir_content_from_map must also be single segments.
The example below uses these variables:
variables:
config_agent:
description: "Newrelic infra configuration"
type: yaml
required: false
default: ""
config_integrations:
description: "map of YAML configs for the OHIs"
type: map[string]yaml
required: false
default: {}
config_logging:
description: "map of YAML config for logging"
type: map[string]yaml
required: false
default: {}And this filesystem block:
filesystem:
newrelic-infra.yaml:
kind: file
persistent: true
text: |
${nr-var:config_agent}
config:
kind: dir
persistent: true
logging.d:
kind: dir_content_from_map
source: ${nr-var:config_logging}
agent:
kind: dir
entries:
data:
kind: dir
persistent: true
integrations.d:
kind: dir_content_from_map
source: ${nr-var:config_integrations}
newrelic-infra.yaml:
kind: file
text: |
${nr-var:config_agent}Given these user-supplied values:
config_agent: |
license_key: REDACTED
log:
level: info
config_integrations:
nri-mysql.yaml: |
integrations:
- name: nri-mysql
env:
HOSTNAME: localhost
nri-redis.yaml: |
integrations:
- name: nri-redis
env:
HOSTNAME: localhost
config_logging:
syslog.yaml: |
logs:
- name: syslog
file: /var/log/syslogThe runtime produces the following on disk under ${nr-sub:filesystem_agent_dir}. Each kind is shown in isolation.
kind: file: single file rendered from the templated text: field. persistent: true keeps it across agent-control stop and restarts.
newrelic-infra.yaml ← contents from ${nr-var:config_agent}
kind: dir: an explicitly declared directory. With no entries: it's just an (optionally persistent) empty directory; with entries: it builds a tree, where each child is itself any of the three kinds, including another dir, so recursion is uniform.
config/ ← empty, persistent
agent/
├── data/ ← empty, persistent
├── integrations.d/ ← projected from config_integrations (see below)
│ ├── nri-mysql.yaml
│ └── nri-redis.yaml
└── newrelic-infra.yaml ← contents from ${nr-var:config_agent}
kind: dir_content_from_map: a directory whose entries are projected from a map[string]yaml variable at deploy time. Map keys become filenames; map values become file bodies.
logging.d/
└── syslog.yaml ← contents from config_logging["syslog.yaml"]
agent/integrations.d/
├── nri-mysql.yaml ← contents from config_integrations["nri-mysql.yaml"]
└── nri-redis.yaml ← contents from config_integrations["nri-redis.yaml"]
file — a single file with literal or templated content.
| Field | Required | Default | Description |
|---|---|---|---|
kind |
yes | — | Must be file. |
text |
yes | — | File body. May reference ${nr-var:…} / ${nr-sub:…}. |
persistent |
no | false |
If true, survives sub-agent stop/restart. |
dir — an explicitly declared directory. Its children, if any, live under entries:.
| Field | Required | Default | Description |
|---|---|---|---|
kind |
yes | — | Must be dir. |
entries |
no | {} |
Map of child entries (any kind). Recursive. Each key must be a single path segment, not a sub-path. |
persistent |
no | false |
If true, this directory survives stop/restart. Not inherited, each child is judged by its own persistent flag (see Persistence). |
dir_content_from_map — a directory whose set of files is computed at deploy time from a map[string]yaml variable. The map's keys become filenames; the values become file contents.
| Field | Required | Default | Description |
|---|---|---|---|
kind |
yes | — | Must be dir_content_from_map. |
source |
yes | — | Reference to a map[string]yaml variable (${nr-var:…}). |
Every file and dir entry accepts a boolean persistent: (default false). Two independent mechanisms govern lifecycle:
- The
persistentflag controls whether the entry's on-disk path is wiped when the tree is cleaned: on sub-agent stop, and just before every (re)write of the tree (start, restart, and config apply). Ephemeral entries are wiped at those points; persistent entries are kept. Wiping before each write means leftover ephemeral content never carries across — even after an ungraceful shutdown (crash/SIGKILL) that skipped the stop-time cleanup. - The manifest drives reconciliation on every write event. Anything Agent Control wrote on the previous successful write is recorded in the manifest. On the next write, Agent Control diffs the manifest against the new declared set: paths it owned previously and no longer owns are deleted; paths it never owned are left alone.
The flag does not shield the entry from intentional removal: if you delete an entry from the agent type (or remove a key from a dir_content_from_map source map), the manifest diff catches it and the on-disk path is deleted on the next write event.
persistent applies per entry and does not cascade to children. When cleaning (on stop, and before each (re)write), cleanup walks the declared tree: a persistent entry is kept and the walk descends into its children, while an ephemeral entry is deleted together with its entire on-disk subtree (a recursive remove_dir_all, which stops the walk there). So a nested path survives cleanup only if every declared node on the path is persistent: true.
dir_content_from_map has no persistent flag. Agent Control owns and re-renders the projected files on every write, so it is always ephemeral. A persistent: key left in the YAML is silently ignored, so older configs still parse.
After every successful write, Agent Control writes .ac-managed-paths.json inside the sub-agent's filesystem directory listing the absolute paths it just wrote. This filename is reserved — agent types must not declare it.
The manifest is the source of truth for "what Agent Control owns." Files the sub-agent process creates at runtime are never in the manifest, so they're invisible to reconciliation: they survive every write event, every sub-agent restart, and every config update. They're only removed if some declared ancestor directory is itself removed from the agent type (the remove_dir_all of the parent takes them as collateral) or if the agent is removed from the fleet.
The manifest stores rendered paths, not agent-type declarations. Reconciliation runs on the tree after variable substitution, so the manifest records the concrete absolute paths Agent Control actually wrote. This matters most for dir_content_from_map: the agent type only names the directory and a source: variable, but at render time each map key is expanded into its own file path, and every one of those paths is recorded individually in the manifest.
As a result, removing a key from the source map is reconciled exactly like deleting a literal entry from the agent type: the rendered path is in the previous manifest but absent from the new declared set, so it is deleted on the next write.
Example. Agent type:
filesystem:
integrations.d:
kind: dir_content_from_map
source: ${nr-var:config_integrations}The config_integrations variable (a map[string]yaml) supplied at deploy time:
config_integrations:
nri-mysql.yaml: |
integrations:
- name: nri-mysql
nri-redis.yaml: |
integrations:
- name: nri-redisWith ${nr-sub:filesystem_agent_dir} resolving to /var/lib/newrelic-agent-control/filesystem/nr-infra, the write produces integrations.d/nri-mysql.yaml and integrations.d/nri-redis.yaml, and the resulting .ac-managed-paths.json is:
{
"managed_paths": [
"/var/lib/newrelic-agent-control/filesystem/nr-infra/integrations.d",
"/var/lib/newrelic-agent-control/filesystem/nr-infra/integrations.d/nri-mysql.yaml",
"/var/lib/newrelic-agent-control/filesystem/nr-infra/integrations.d/nri-redis.yaml"
]
}Note the directory plus one entry per rendered map key, it's the variable's content that lands in the manifest, not the source: reference. If the next deploy drops nri-redis.yaml from config_integrations, the new declared set no longer contains …/integrations.d/nri-redis.yaml while the previous manifest still does, so that file is deleted on the next write.
- Ephemeral (
persistent: false, default). Wiped on sub-agent stop, and again just before every (re)write of the tree (start, restart, config apply); the declared entry itself is then re-created by the write. Leftover content (including files the agent created inside an ephemeral directory) never carries across a restart, even an ungraceful one. - Persistent (
persistent: true). Kept on stop and across (re)writes; onlywritere-renders its declared content. - Removed from fleet. When an agent is removed from the fleet config (via remote config or by being absent at AC startup after a previous deploy), its entire filesystem directory is deleted by
ResourceCleaner. Thepersistentflag is bypassed.
| Event | Ephemeral (persistent: false) |
Persistent (persistent: true) |
|---|---|---|
| Agent start | Wiped, then reconcile (manifest diff) + write | Kept; reconcile (manifest diff) + write |
| Agent stop | Path deleted | Path kept |
| Agent restart | Wiped, then reconcile + write | Kept; reconcile + write |
| Config update | Wiped, then reconcile + write | Kept; reconcile + write |
| Removed from fleet | Filesystem dir deleted by ResourceCleaner | Filesystem dir deleted by ResourceCleaner |
Agent-process-created files survive a reconcile + write (they're not in the manifest and not declared) except files inside an ephemeral directory, which are wiped along with it on stop and before each (re)write. To keep agent-created content across restarts, place it under a persistent directory.
Defines OCI packages containing the executables and data to be downloaded and installed for the sub-agent. This is a map where keys are package identifiers and values contain package metadata and download configuration.
The value yaml look like:
download:
oci:
repository: ${nr-var:oci.repository}
version: ${nr-var:version}
public_key_url: https://publickeys.newrelic.com/g/agent-control-oci/global/nrinfraagent/jwks.jsonNote that a Package version. Can be:
- A tag (
:v1.0.0) - A digest (
@sha256:...) - Both tag and digest (
:v1.0.0@sha256:...), when both are specified the digest takes precedence.
public_key_url is an optional field, when not configured signature verifications is skipped and logged with warn level.
Warning
The package in the OCI repository MUST follow a specific structure.
Post-Download Hook:
The post_download_hook is an optional field that allows executing a custom script after the package is downloaded and extracted. This is useful for:
- Installing system dependencies
- Compiling native code
- Performing system configuration
- Validating installation requirements
- Running setup scripts that cannot be handled through simple file extraction
The hook runs with a hardcoded timeout of 300 seconds (5 minutes) and is not configurable. If the script exits with a non-zero status code, the package installation fails.
post_download_hook:
path: /bin/bash # or just "bash" (searches in PATH)
args:
- /absolute/path/to/script.sh
- --arg1
- --arg2
env:
PACKAGE_VERSION: ${nr-var:version}
CUSTOM_VAR: some-valueFields:
path: Path to the command/interpreter. Can be absolute (e.g.,/bin/bash,C:\Windows\System32\cmd.exe) or relative (e.g.,bash,python3,cmd) which will be searched in the system PATH. Required.args: List of arguments passed to the command. The structure depends on your use case (see examples below). Can be empty for binaries that don't require arguments. Required.env: Optional map of environment variables passed to the script process.
The script execution environment includes:
PACKAGE_DIR: Automatically set to the package installation directory- Current working directory: Set to the package directory
stdout: Discarded (to avoid log noise)stderr: Captured and logged on failure
Note
On Unix systems, if path points to a file, it will be automatically made executable (chmod +x) before execution. This ensures scripts extracted from OCI packages work even if they don't have execute permissions in the archive.
Linux Examples:
# Using bash from PATH with absolute script path
post_download_hook:
path: bash
args:
- /opt/newrelic/install.sh
- --check-dependencies
env:
AGENT_VERSION: ${nr-var:version}
# Using absolute interpreter path
post_download_hook:
path: /usr/bin/python3
args:
- /opt/newrelic/setup.py
- --install
# Using relative script path (relative to package directory)
post_download_hook:
path: bash
args:
- ./install.sh
- --verbose
# Direct binary execution without arguments
post_download_hook:
path: /usr/bin/validate-system
args: []Windows Examples:
# Using cmd.exe with /c flag
post_download_hook:
path: cmd
args:
- /c
- C:\newrelic\install.bat
- --check-dependencies
env:
AGENT_VERSION: ${nr-var:version}
# Using PowerShell
post_download_hook:
path: powershell
args:
- -ExecutionPolicy
- Bypass
- -File
- C:\newrelic\setup.ps1
# Direct batch script execution (Windows can execute .bat/.cmd directly)
post_download_hook:
path: C:\newrelic\install.bat
args:
- --verboseComplete package example:
packages:
ebpf-agent:
download:
oci:
repository: ${nr-var:oci.repository}
version: ${nr-var:version}
post_download_hook:
path: bash
args:
- ./install.sh
- --check-dependencies
env:
AGENT_VERSION: ${nr-var:version}Accessing Package Contents:
After installation, the package directory path is available via the reserved variable ${nr-sub:packages.<package-id>.dir}, where <package-id> is the key used in the packages map.
Example:
In this example:
- A package named
infra-agentis downloaded from an OCI registry - The package installation directory is referenced in the executable path using
${nr-sub:packages.infra-agent.dir}
packages:
infra-agent:
download:
oci:
repository: ${nr-var:oci.repository}
version: ${nr-var:version}
executables:
- id: newrelic-infra
path: ${nr-sub:packages.infra-agent.dir}\\newrelic-infra.exeWhen set, this redirects the stdout and stderr of the created process to files inside AC's logging directory (see on-host troubleshooting in the official public documentation). These log files will reside inside a directory dedicated to the current sub-agent, identifiable by its agent_id.
Enables periodically checking the health of the sub-agent. See Health status below for more details. Accepts the following values:
interval: Periodicity of the check. A duration string.initial_delay: Initial delay before the first health check is performed. A duration string.timeout: Maximum duration a health check may run before considered failed.httporfile: The type of health check used.httpmeans that the supervisor for this sub-agent will attempt to query an HTTP endpoint and will decide on healthiness depending on the status code. Accepts the following fields:host, string.path, string.port, a number.headers: key-value pairs for authentication or other required info.healthy_status_codes: The status codes that mean a healthy state. If not set, as of now the 200s will be considered healthy and the rest unhealthy.
filemeans that the supervisor for this sub-agent will attempt to read a file and find expected contents. Failing to do so, or reading information that means an unhealthy state, will mark the sub-agent as unhealthy. Acceptspathas its only field.
If no health configuration is defined, AC will use the exceeding of the restart policy (if also defined) to determine if the sub-agent should be labelled as unhealthy.
Agent Control in Kubernetes uses two distinct namespaces for resource management:
- Agent Control namespace (
namespace): This is where Agent Control, Flux, and their supporting resources are created and managed. - Agents namespace (
namespace_agents): This is dedicated to sub-agents and their managed resources. Ideally, Instrumentation CRs should be in the Agent Control namespace, but due to a limitation in thek8s-agents-operator, they must be in the same namespace as the operator.
This separation makes it more secure. That way, agents can't use Flux or Agent Control Service Accounts with wide privileges. When defining agent types or configuring deployments, ensure that resources are created in the correct namespace. The variables ${nr-ac:namespace} and ${nr-ac:namespace_agents} are available for templating these values in your agent type definitions.
The following fields are used for configuring the Kubernetes deployment of a sub-agent.
The health configuration for Kubernetes. See Health status below for more details. Accepts the following values:
interval: Periodicity of the check. A duration string. Default to 60s.initial_delay: Initial delay before the first health check is performed. A duration string. Default to zero.checks: An optional list of Kubernetes resources to health-check. If omitted or empty, health checking is disabled for this sub-agent. Each entry accepts:name: The name of the Kubernetes object (supports template variables).namespace: The namespace where the object lives (supports template variables).kind: The kind of resource to check. One of:Deployment,DaemonSet,StatefulSet: checks the named workload directly. If the resource does not exist, the sub-agent is considered healthy (a missing workload is not treated as a failure). Health is computed considering the workload's status. Eg: desired vs. available replicas.Instrumentation: checks a New Relic Instrumentation CR. If the resource does not exist, the health check reports an error.HelmReleaseWorkload: checks the named HelmRelease CR plus the Deployment, DaemonSet, and StatefulSet workloads belonging to the release (discovered via the Flux labelhelm.toolkit.fluxcd.io/name). If the HelmRelease CR does not exist, the health check reports an error.
target_namespace: the namespace where the Helm-deployed workloads run. Defaults tonamespace. Use this when the HelmRelease installs workloads into a different namespace than the one containing the HelmRelease CR itself.
Example for a Helm-based agent deploying workloads into a separate namespace:
health:
interval: 30s
initial_delay: 30s
checks:
- namespace: ${nr-ac:namespace}
name: ${nr-sub:agent_id}
kind: HelmReleaseWorkload
target_namespace: ${nr-ac:namespace_agents}Example for an APM agent using an Instrumentation CR:
health:
interval: 30s
initial_delay: 30s
checks:
- namespace: ${nr-ac:namespace_agents}
name: ${nr-sub:agent_id}
kind: InstrumentationExample checking individual workload kinds explicitly:
health:
interval: 30s
initial_delay: 30s
checks:
- namespace: ${nr-ac:namespace_agents}
name: my-deployment
kind: Deployment
- namespace: ${nr-ac:namespace_agents}
name: my-daemonset
kind: DaemonSet
- namespace: ${nr-ac:namespace_agents}
name: my-statefulset
kind: StatefulSetNote
In the example above the agent will be considered unhealthy if any of the corresponding resources is found but its status doesn't meet the workload criteria. This allows supporting agents with configurable workloads.
Key-value pairs of the Kubernetes Objects to be created by this sub-agent on deployment. The key is an internal identifier of the object, while the value is the object itself which accepts the following values:
apiVersion, a string.kind, a string.metadata: Accepting the following:name, a string.namespace, a string.labels: key-value pair of strings representing Kubernetes labels.
- And a collection of arbitrary fields representing the actual data (e.g. the
spec) of the object.
Most of Agent Control sub-agents currently deploy Flux CRs which end up in helm chart installation.
You can check an existing agent type with a Kubernetes deployment as an example. This file includes all necessary Flux CR configurations required for Agent Control to manage sub-agent deployments effectively. It serves as a comprehensive reference for understanding the integration and deployment process.
Before AC can create a sub-agent, it must resolve the agent type referenced in your config (for example newrelic/com.newrelic.infrastructure:0.1.0) to an actual definition. AC looks for that definition in three sources, in a fixed order of precedence. The first source that provides a matching definition wins, so the order is:
-
Custom (local) definitions — highest precedence. These are YAML files you place in AC's dynamic agent types directory (on-host:
/etc/newrelic-agent-control/dynamic-agent-types), read from disk at startup. A custom definition whose id matches a built-in one overrides the built-in. If two custom files declare the same id, the one whose file name sorts last wins.Custom definitions are intended for development and testing only, not for production use. They are a way to iterate on a definition locally before it is shipped as an embedded definition or published to a remote registry.
-
Embedded (built-in) definitions — the agent types shipped with AC. They are compiled into the binary, so they are always available with no network or filesystem dependency. The currently embedded definitions are listed in the agent type registry.
-
Remote definitions — lowest precedence. If the agent type is not found locally, AC fetches it from an OCI registry.
Only definitions that target the environment of the running binary are considered: the on-host binary only sees host definitions matching its operating system, and the Kubernetes binary only sees kubernetes definitions (see Agent Type Metadata). A definition that targets a different platform is treated as not found, so the lookup falls through to the next source.
This precedence is what makes the custom directory useful for development: you can add a brand-new agent type, or iterate on and override an existing one, simply by dropping a file there — without rebuilding AC or editing the embedded registry — while still falling back to the built-in and remote sources for everything else. For a step-by-step walkthrough of adding a custom on-host agent type, see the development guide in the agent type overview.
The first time it runs, whether it's using static configs or when already running and receiving remote configuration values from FC, AC will create an internal entity called a supervisor for each of the declared sub-agents. Each of these supervisors have the following responsibilities:
- Retrieve the configuration available for it, either locally or by listening for remote if FC is enabled.
- Attempt to assemble the actual, effective config that the sub-agent will have.
- If the assembly is successful, attempt to deploy (spawn process or create Kubernetes resources) the sub-agent using the effective config.
- Once the sub-agent is deployed:
- Perform regular health checks.
- Restart it if it crashes, according to the configured restart policy (for on-host).
- Assure that the resources match the ones defined in the agent-type (for k8s).
- If Fleet Control is enabled, the supervisor will listen for incoming remote configs different from the one currently in use:
- When receiving one, the supervisor will stop its workload and restart from step 1 again.
- If an empty config is passed it means that this agent should be retired, so the supervisor will just stop its workload and exit.
- On failure of assembly or deployment, the supervisor will be kept alive, but will report itself as unhealthy. If FC is enabled, this offers the user the possibility of pushing a new remote config, in case the sub-agent was left in a bad state due to receiving an invalid one.
Agent Control itself shares much of the behavior of a supervisor, that's how, if FC is enabled, it can receive remote configs (mainly the desired list of sub-agents) and apply them.
When we mention a sub-agent's effective config, we actually mean a concept from the OpAMP protocol. It consists on the configuration values that can be received remotely from an OpAMP server, so it does not necessarily (and often just won't) match the configuration of the workload itself. The configuration values are expected to couple with the agent type definition to render the final instructions on how to render agents. You can assume that the effective config is more for the supervisor than for the sub-agent itself.
Of course, these values might still contain your observability agent's own config among the rest of the values, but it should not be assumed that these values fully determine and represent the actual state of your agent's config at all times. For example, if your agent is designed in a way that can accept remote configs through other means (like over the network) that take precedence over the config it first runs with or the configs present in files it watches (as these could be rendered by the supervisor), integrating your agent with AC does not make it aware of these other configuration means, so a mismatch of what we call the effective config vs the actual config of your agent is to be expected.
The following flowchart illustrates the config application of a sub-agent via its supervisor, though it omits the health checks and its explicit reporting (along the effective config) to FC.
%%{ init: { 'theme': 'neutral' }}%%
flowchart TB
classDef central fill:#00E580;
classDef optional stroke-dasharray: 2 2
AC@{ label: Agent Control process }
ACC@{ shape: doc, label: "AC config values
(local or remote)"}
S@{ shape: procs, label: Agent Supervisor }
C@{ shape: doc, label: "Sub-agent config values
(local or remote)"}
T@{ shape: doc, label: Agent type definition}
A@{ shape: diamond, label: Assemble }
G@{ shape: lean-r, label: Assembled Agent }
P@{ label: Config assets }
D@{ shape: diamond, label: Deploy }
DA@{ label: Deployed Agent }
F@{ shape: doc, label: Effective config }
AC -->|reads or listens for| ACC
AC -->|creates| S
S -->|read or listen for| C
S -->|for| T
C & T --> A
A -->|ok| G
G -.->|"persists (if any)"| P
G --> D
G -->|has| F
DA -.->|"reads (if any)"| P
A & D -->|error| S
D --> DA
class AC central
class C central
class ACC central
class T central
class DA central
class F central
class P optional
The health status that AC reports to FC follows the definition of component health used by the OpAMP protocol. Essentially, for each sub-agent we will send the following information:
- If it is healthy or not.
- The time the sub-agent was started (as UNIX time in nanoseconds).
- The time of the last health check (as UNIX time in nanoseconds).
- A status message using agent-specific semantics.
- If the sub-agent is unhealthy, a human-readable error message commonly called last error.
- Optionally, we are capable of sending this same information for arbitrary levels of subcomponents, representing a composite, more granular health. As of now, this information won't be used by FC.
However, we don't offer the same degree of support for agent type authors to populate this information, and in some cases we provide this information internally from AC. Where complete support is offered, the author of the agent type is ultimately responsible for the contents of the health messages built (for example, ensuring the status message uses agent-specific semantics or the error message is human-friendly).
Currently, HTTP support for on-host health checks is kept simple. With the host, path and port provided in the agent type definition, AC will compose a URL and the supervisor for any of these sub-agents will periodically perform an HTTP GET request to it. The response body of this request converted to UTF-8 will be used as the health status message.
If the status code is in the 200's (successful) or one of the configured in the list of allowed status codes the sub-agent will be reported as healthy. If the request times out or the status code is not one defined as healthy, AC will report the sub-agent as unhealthy, using "Health check failed with HTTP response status code <CODE>" as the last error string.
With file-based health checks, a YAML file is expected to be present at the location configured, with the following format:
healthy: false
status: "some agent-specific message"
last_error: "some error message" # optional, for the case healthy == false
start_time_unix_nano: 1725444000
status_time_unix_nano: 1725444001AC will periodically attempt to read this file and forward its contents as a health message. If health is true, but there are contents for last_error, the latter will be ignored and the check will be considered healthy.
The workload is responsible for keeping this file updated over time, as AC will not check that property. It will only parse and propagate the values as the health message.
The file-based health check is implemented for the New Relic APM agents, and is leveraged internally when running on Kubernetes. See Instrumentation CR and APM for details.
The approaches followed by on-host are not trivial to implement for Kubernetes, and Kubernetes already provides built-in mechanisms to inspect the health of its resources, so AC leverages these built-ins.
Health checking for Kubernetes is driven by the checks entries declared in the agent type's health configuration (see health (Kubernetes) above). Each check targets one resource by name, namespace, and kind, and the operations performed depend on the kind. If no checks are declared, health checking is disabled for the sub-agent.
The nature of all these checks is all in all very similar. It involves mostly querying the Kubernetes API server for a certain resource, looking up specific fields of its object representation (like its status or its metadata), and performing an evaluation of the values contained within them. The only difference is that the structure of the Instrumentation objects is defined by New Relic, while the remaining ones are defined by Kubernetes itself or by well-known tooling of the Kubernetes ecosystem such as Helm.
For agents that do not define Helm releases, how they work, and the Instrumentation CRD, see Agent-less supervisors below.
If the agent type's deployment section for Kubernetes defined Helm releases, the health check will query different information sources and evaluate their contents with resource-specific logic. If any of these evaluations determine as unhealthy, the sub-agent will be considered unhealthy.
The sources inspected are listed below:
AC will attempt to retrieve the status field of the Helm release object. Inside this status it will retrieve a list of conditions and check if the ready condition exists and is true, which means healthy. If it's false, it will consider this check unhealthy and emit a message as last error.
For StatefulSets, AC will set healthy whenever the number of replicas matches the number of ready replicas.
With DaemonSets, the health check will evaluate if the number of pods is the desired one, if no pods are unavailable, and in the case the DaemonSet upses the rolling update strategy, if all pods are running the latest version.
For Deployments, AC will set healthy whenever there is no unavailable replicas.
The Instrumentation is a custom resource defined by New Relic to represent the status of agents inside Kubernetes, mostly to enable the use case of supervising Agent-less workloads. Much of the information specified here will contain details specific to them and how they work on Kubernetes, mentioning components such as the Kubernetes Operator or the Sidecar. Retrieving the health information of Instrumentations is a completely custom procedure that is not strictly related to Kubernetes beyond the retrieval of the resource itself.
As of now, the health check for Instrumentations involves reading their status value, which should contain the following fields:
podsMatchingis the number of pods which match theInstrumentation.spec.podLabelSelectorsandInstrumentation.spec.NamespaceLabelSelectors.podsHealthyis the number of pods which match based onpodsMatchingandpodsInjected(see below) and the operator (see below) was able to get:- The correct pod IP/port.
- A health response which had a healthy status reported via the YAML field
healthy. - An HTTP status code of
200.
podsInjectedis the number of pods which matched the Instrumentation based onpodsMatchingwhich had the health sidecar injected.podsNotReadyis the number of pods which are not in a ready state (Pod.status.phase!="Running")podsMatchingandpodsInjected.podsOutdatedis the number of pods which match based onpodsMatchingandpodsInjected, but where there's a mismatch between theInstrumentation.generationand the injected pod's annotation (to identify changes to the spec).podsUnhealthyis the number of pods which failed a health check, either because the operator couldn't get the pod's IP or port, communication issues, timeout, non-200 HTTP status, failure to decode the HTTP response, and lastly thelast_errorfield in the response.unhealthyPodsErrorsis a list of pods (namespace.name/pod.name) and either the last error from the response or the error from the operator while trying to collect health.podis the name of the pod.last_erroris the error string.
As of now, the logic determining if a sub-agent is the following:
podsNotReadymust be 0.podsUnhealthymust be 0.podsHealthymust be more than 0.podsMatchingmust be more than 0.podsInjectedmust be equal topodsMatching.
If unhealthy, the last_error field will be populated with the Instrumentation status' unhealthyPodsErrors field.
If we hadn't mentioned the possibility of agent-less supervisors before, you might have asked yourself if all the agents that AC can support are limited to ones where an actual, stand-alone process (either traditional one on a server or Kubernetes-based such as a pod) is running. After all, the main example of supported agent type is the Infrastructure Agent, a separate binary that is intended to run alongside the customer business workload, but separate from them.
It turns out that there are other use cases for which New Relic Control might be useful where a process either does not exist or must not be managed by AC. APM is one of them.
With APM, a customer instruments some existing application by plugging in a shared object, library, or some other plug-in component to the programming language runtime. For example, you could add some Java-specific command line options pointing to the New Relic Java agent's JAR file when running your Java application, so the agent is hooked into the JVM. In this case, the only process is the actual customer application, whose lifecycle must not be managed by AC as opposed to a separate observability agent.
This is why AC supports defining agent types that do not include any actual stand-alone process but otherwise have observability agent functionality, can receive remote configs and expose health information. As of now, this is mostly supported in Kubernetes only, with on-host planned for the future.
The APM use case for Kubernetes is supported by using some additional components that run in the cluster alongside AC. The main one is Kubernetes APM auto-attach (also known as Kubernetes agents operator or just the Kubernetes operator), which is defined as an agent type for AC.
Normally, the Kubernetes operator will intercept API requests for deploying pods onto nodes and, depending on the configuration specified, adds the appropriate language agent to the application via an init container. This is achieved by the operator creating the Instrumentation Custom Resource Definition (CRD), so it is later possible to create Instrumentation resources configured to match application pods via pod label selectors. After these Instrumentation resources are created, the operator will inject the init container for the new pods matching these labels.
When used with AC, the operator will also inject a sidecar container next to each application pod. This sidecar has the role of retrieving the health status using a similar method to the file-based health check approach for on-host:
- The language agents inside the application pods will write the health file (or files) into the file system.
- The sidecar will read these files and expose their contents as an HTTP endpoint. If it reads many files, it will coalesce their information into a single health output.
- The Kubernetes operator will fetch the health information from the sidecars and will update the Instrumentation CR's status with the value of this health.
- The health inside the Instrumentation CR is read periodically by AC, as exposed above when discussing health in Kubernetes on Instrumentation CR, and then reported to FC.
So, to support APM use cases with AC, we can define an agent type that specifies an Instrumentation CR appropriate for the language. Ensuring that this agent type instance is deployed alongside a single instance for the Kubernetes operator (also an agent type deployable with AC), the auto-instrumentation of applications and the health reporting will work.
As the Instrumentation CR also enables configuring the APM agents, pushing remote configurations from FC is also possible.
You can check our agent type definitions for the currently supported APM languages in our hardcoded registry.
The following diagram reflects the flow of an APM agent being added as a remote config to a Kubernetes cluster where the Kubernetes operator is already deployed. Some of the arrows connecting entities are numbered to represent the timing.
%%{ init: { 'theme': 'neutral' }}%%
flowchart TB
classDef central fill:#00E580;
classDef optional stroke-dasharray: 2 2
FC@{ label: Fleet Control }
AC@{ label: Agent Control }
KO@{ label: Kubernetes Operator agent }
I@{ label: Instrumentation CR }
subgraph POD [Application Pod]
direction TB
APM@{ label: APM Agent init container }
UA@{ label: User application }
S@{ label: Sidecar }
end
class KO central
class I central
class APM central
class S central
FC -->|1 - adds Operator config to| AC
AC -->|2 - creates resources of| KO
KO -->|3 - defines for all languages| I
FC -->|4 - adds APM agent config to| AC
AC -->|5 - adds| I
KO -->|6 - injects| APM & S
APM -->|injects agent| UA
UA -->|writes health| S
KO -->|7 - monitors| S
KO -->|8 - updates status| I
AC -->|manages| KO
AC -->|monitors health| I & KO
AC -->|applies remote configs| I & KO