Ecosystem Edge Hardware Accelerators - NFD

Overview

NFD tests are developed for the purpose of testing the deployment of the Node Feature Discovery (NFD) operator and its respective NodeFeatureDiscovery Custom Resource instance to automatically detect and label hardware and software capabilities of cluster nodes.

The Node Feature Discovery Operator manages the deployment of out-of-tree kernel modules and associated device plug-ins in Kubernetes. It discovers hardware features available on each node in a Kubernetes cluster, and advertises those features using node labels.

Prerequisites and Supported setups

Regular cluster with multiple master nodes (VMs or BMs) and workers (VMs or BMs)/SNO
Public Clouds Cluster (AWS, GCP and Azure)
On Premise Cluster

Test suites:

Name	Description
features	Tests related to NFD operator deployment and feature discovery
2upgrade	Tests related to NFD operator upgrade functionality

Notes:

2upgrade name is used to assure that ginkgo runs that before features.
features contains the main NFD functionality tests including label discovery, pod status checks, and node feature detection.

Internal pkgs

get

Utilities to obtain various objects like pod status, logs, restart count, node feature labels, CPU features, and resource counts.

wait

Helpers for waiting for various states of pods, nodes, and label discovery completion.

set

Helper for setting CPU feature configurations including blacklist/whitelist management and custom NFD worker configurations.

nfdconfig

Configuration management that captures and processes environment variables used for NFD test execution.

nfddelete

Helpers for cleaning up NFD-specific labels from cluster nodes and custom resources after test execution.
custom_resources.go: Comprehensive NFD custom resource cleanup with finalizer handling for operator uninstallation.
feature_labels.go: Node label cleanup utilities for test isolation.

search

Common utilities for searching and string manipulation operations.

nfdhelpersparams

Constants and variables for NFD helper functions including valid pod name lists and container definitions.

params

Configuration constants including default NFD worker configuration templates.

Eco-goinfra pkgs

Inputs and Environment Variables

Parameters for the script are controlled by the following environment variables:

Feature Discovery related

ECO_HWACCEL_NFD_CR_IMAGE: Custom NFD container image to use for NodeFeatureDiscovery CR. If not specified, the default operator image is used - optional
ECO_HWACCEL_NFD_CATALOG_SOURCE: Custom catalog source to be used for NFD operator installation. If not specified, the default "certified-operators" catalog is used - optional
ECO_HWACCEL_NFD_AWS_TESTS: Enable AWS-specific tests including day2 worker node addition. Set to "true" when running NFD tests against AWS clusters - optional
ECO_HWACCEL_NFD_CPU_FLAGS_HELPER_IMAGE: Container image used for CPU flags helper tests. Required when running AWS day2 worker tests - required for AWS tests

Upgrade related

ECO_HWACCEL_NFD_SUBSCRIPTION_NAME: Name of subscription used to deploy the NFD operator - required for upgrade tests
ECO_HWACCEL_NFD_CUSTOM_NFD_CATALOG_SOURCE: Custom catalog source name used for performing operator upgrades - required for upgrade tests
ECO_HWACCEL_NFD_UPGRADE_TARGET_VERSION: Expected version of the operator after upgrade completion - required for upgrade tests

General Test Framework Variables

ECO_TEST_LABELS: ginkgo query passed to the label-filter option for including/excluding tests - optional
ECO_VERBOSE_SCRIPT: prints verbose script information when executing the script - optional
ECO_TEST_VERBOSE: executes ginkgo with verbose test output - optional
ECO_TEST_TRACE: includes full stack trace from ginkgo tests when a failure occurs - optional
ECO_TEST_FEATURES: list of features to be tested. Should include "nfd" for NFD tests - required

In case the required inputs are not set, the tests are skipped.

Running HW-Accel NFD Test Suites

Running NFD Feature Discovery tests

$ export KUBECONFIG=/path/to/kubeconfig
$ export ECO_DUMP_FAILED_TESTS=true
$ export ECO_REPORTS_DUMP_DIR=/tmp/eco-gotests-logs-dir
$ export ECO_TEST_FEATURES="nfd"
$ export ECO_TEST_LABELS='nfd,discovery-of-labels'
$ export ECO_VERBOSE_LEVEL=100
$ export ECO_HWACCEL_NFD_CATALOG_SOURCE="certified-operators"
$ export ECO_HWACCEL_NFD_CR_IMAGE="registry.redhat.io/openshift4/ose-node-feature-discovery:latest"
$ make run-tests

Running NFD tests on AWS with day2 worker addition

$ export KUBECONFIG=/path/to/kubeconfig
$ export ECO_TEST_FEATURES="nfd"
$ export ECO_TEST_LABELS='nfd'
$ export ECO_HWACCEL_NFD_AWS_TESTS=true
$ export ECO_HWACCEL_NFD_CPU_FLAGS_HELPER_IMAGE="<cpu-flags-helper image specific to cluster architecture>"
$ export ECO_HWACCEL_NFD_CATALOG_SOURCE="certified-operators"
$ make run-tests

Running NFD Upgrade tests

$ export KUBECONFIG=/path/to/kubeconfig
$ export ECO_TEST_FEATURES="nfd"
$ export ECO_TEST_LABELS='nfd_upgrade'
$ export ECO_HWACCEL_NFD_SUBSCRIPTION_NAME='nfd-subscription'
$ export ECO_HWACCEL_NFD_UPGRADE_TARGET_VERSION='4.17.0'
$ export ECO_HWACCEL_NFD_CUSTOM_NFD_CATALOG_SOURCE='custom-nfd-catalog'
$ export ECO_HWACCEL_NFD_CATALOG_SOURCE="certified-operators"
$ make run-tests

Test Scenarios

The NFD test suite covers the following scenarios:

Pod Status Verification: Ensures all NFD pods (controller-manager, master, worker, topology) are running correctly
Log Analysis: Checks NFD pod logs for error messages and exceptions
Restart Count Monitoring: Verifies that NFD pods have zero restart counts
Feature Label Discovery: Tests detection and labeling of CPU features, kernel configurations, and hardware capabilities
NUMA Detection: Validates NUMA topology detection and labeling (when supported)
Blacklist/Whitelist Functionality: Tests CPU feature filtering using blacklist and whitelist configurations
Day2 Worker Addition: Tests NFD functionality when adding new worker nodes to AWS clusters
Operator Upgrade: Tests upgrading NFD operator to newer versions

Generic Operator Installer and NFD CR Utilities

The NFD tests now use a modern, clean approach with generic operator installer and standalone custom resource utilities:

Basic NFD Deployment

installConfig := nfdhelpers.GetStandardNFDConfig(apiClient)
installer := deploy.NewOperatorInstaller(installConfig)
err := installer.Install()

// Wait for operator readiness
ready, err := installer.IsReady(5 * time.Minute)

/
crUtils := deploy.NewNFDCRUtils(apiClient, "openshift-nfd")

nfdConfig := deploy.NFDCRConfig{
    EnableTopology: true,
    Image:          "registry.redhat.io/openshift4/ose-node-feature-discovery:latest",
}

err = crUtils.DeployNFDCR("nfd-instance", nfdConfig)

// Wait for CR readiness
crReady, err := crUtils.IsNFDCRReady("nfd-instance", 3*time.Minute)

NFD with Custom Configuration

// Use helper with custom catalog source
options := &nfdhelpers.NFDInstallConfigOptions{
    CatalogSource: nfdhelpers.StringPtr("custom-catalog"),
}
installConfig := nfdhelpers.GetDefaultNFDInstallConfig(apiClient, options)
installer := deploy.NewOperatorInstaller(installConfig)

NFD with Custom Worker Configuration

// Deploy NFD CR with custom worker configuration for CPU features blacklist/whitelist
workerConfig := `
sources:
  cpu:
    cpuid:
      attributeBlacklist: ["BMI1", "BMI2"]
      attributeWhitelist: ["SSE", "SSE2"]
`

err = crUtils.DeployNFDCRWithWorkerConfig("nfd-custom", nfdConfig, workerConfig)

Clean Uninstallation

// Individual CR deletion (for tests)
err := nfdCRUtils.DeleteNFDCR("nfd-instance")

// Complete uninstallation with automatic CR cleanup
uninstallConfig := nfdhelpers.GetDefaultNFDUninstallConfig(
    apiClient,
    "nfd-operator-group",
    "nfd-subscription")
uninstaller := deploy.NewOperatorUninstaller(uninstallConfig)
err = uninstaller.Uninstall()
// ✅ Automatically deletes CRs first (including finalizer handling)
// ✅ Then removes operator, subscription, operator group

Alternative: Direct CR Cleanup

// Direct cleanup of all NFD CRs (useful for tests)
err := nfddelete.AllNFDCustomResources(apiClient, "openshift-nfd")

// Or cleanup specific CRs
err := nfddelete.AllNFDCustomResources(apiClient, "openshift-nfd",
    "nfd-instance", "nfd-instance-custom")

Benefits of the New Approach

Clean Separation: Operator installation is separate from custom resource management
Reusable Components: The generic installer works for NFD, KMM, AMD GPU, and other operators
Standalone Utilities: NFD CR utilities can be used independently of operator installation
Better Error Handling: Comprehensive validation and improved logging for troubleshooting
Simplified Usage: Direct, intuitive API without complex factory patterns
Configuration Helpers: NFD-specific helper functions eliminate repetitive configuration setup
Reduced Duplication: Common NFD patterns are centralized in reusable helper functions

Additional Information

NFD tests verify that hardware features are properly detected and labeled on cluster nodes. The tests support both standard feature detection and custom configurations through worker config modifications. Special attention is given to CPU feature detection, topology awareness, and proper cleanup of labels after test execution.

The new deployment framework provides better reliability and easier maintenance while supporting all existing test scenarios and configurations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ecosystem Edge Hardware Accelerators - NFD

Overview

Prerequisites and Supported setups

Test suites:

Internal pkgs

Eco-goinfra pkgs

Inputs and Environment Variables

Feature Discovery related

Upgrade related

General Test Framework Variables

Running HW-Accel NFD Test Suites

Running NFD Feature Discovery tests

Running NFD tests on AWS with day2 worker addition

Running NFD Upgrade tests

Test Scenarios

Generic Operator Installer and NFD CR Utilities

Basic NFD Deployment

NFD with Custom Configuration

NFD with Custom Worker Configuration

Clean Uninstallation

Alternative: Direct CR Cleanup

Benefits of the New Approach

Additional Information

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Ecosystem Edge Hardware Accelerators - NFD

Overview

Prerequisites and Supported setups

Test suites:

Internal pkgs

Eco-goinfra pkgs

Inputs and Environment Variables

Feature Discovery related

Upgrade related

General Test Framework Variables

Running HW-Accel NFD Test Suites

Running NFD Feature Discovery tests

Running NFD tests on AWS with day2 worker addition

Running NFD Upgrade tests

Test Scenarios

Generic Operator Installer and NFD CR Utilities

Basic NFD Deployment

NFD with Custom Configuration

NFD with Custom Worker Configuration

Clean Uninstallation

Alternative: Direct CR Cleanup

Benefits of the New Approach

Additional Information