This application allows you to automatically generate Kubernetes node labels for the set of Rebellions' NPUs available on a node. It leverages the Node Feature Discovery to perform this labeling.
- Kubernetes >= 1.10
- Nodes with RBLN devices (e.g. ATOM, REBEL) and RBLN driver >= 1.2.79
- NFD deployed on each node you want to label with the local source configured
- When deploying NPU feature discovery with helm (as described below) we provide a way to automatically deploy NFD for you
- To deploy NFD yourself, please see https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/deployment/
The following assumes you have at least one node in your cluster with Rebellions NPUs and the proper RBLN driver has already been installed on it.
The first step is to make sure that Node Feature Discovery is running on every node you want to label. RBLN NPU Feature Discovery use the local source so be sure to mount volumes. See https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/customization-guide.html#local-feature-source for more details.
You can install NFD using the following command which can be found on https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/get-started/quick-start.html#quick-start
kubectl apply -k "https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.17.1"Note:* This is a simple static daemonset meant to demonstrate the basic features required of node-feature-discovery in order to successfully run rbln-npu-feature-discovery. Please see the instruction below for Deployment via helm when deploying in a production setting.
Please install RBLN driver on all your NPU nodes in order to let rbln-npu-feature-discovery collect driver-releated features correctly.
The next step is to run RBLN NPU Feature Discovery on each node as a Daemonset or as a Job.
kubectl apply -f https://raw.githubusercontent.com/rebellions-sw/rbln-npu-feature-discovery/v0.1.0/deployments/static/npu-feature-discovery-daemonset.yamlNote: This is a simple static daemonset meant to demonstrate the basic features required of rbln-npu-feature-discovery. Please see the instructions below for Deployment via helm when deploying in a production setting.
You must change the NODE_NAME value in the template to match the name of the node you want to label:
curl https://raw.githubusercontent.com/rebellions-sw/rbln-npu-feature-discovery/v0.1.0/deployments/static/npu-feature-discovery-job.yaml.template \
| sed "s/NODE_NAME/<your-node-name>/" > npu-feature-discovery-job.yaml
kubectl apply -f npu-feature-discovery-job.yamlNote: This method should only be used for testing and not deployed in a production setting.
With both NFD and rbln-npu-feature-discovery deployed and running, you should now be able to see NPU related labels appearing on any nodes that have NPUs installed on them.
$ kubectl get node <npu-node-name> -o yaml
apiVersion: v1
items:
- apiVersion: v1
kind: Node
metadata:
...
labels:
rebellions.ai/driver-version.full: 1.2.79-6d00b56-release
rebellions.ai/driver-version.major: "1"
rebellions.ai/driver-version.minor: "2"
rebellions.ai/driver-version.patch: "79"
rebellions.ai/driver-version.revision: 6d00b56
rebellions.ai/npu.count: "1"
rebellions.ai/npu.family: ATOM
rebellions.ai/npu.product: RBLN-CA02Available options:
$ rbln-npu-feature-discovery --help
RBLN NPU Feature Discovery
Usage: rbln-npu-feature-discovery [OPTIONS]
Options:
--oneshot Label once and exit
--no-timestamp Do not add timestamp to the labels
--sleep-interval <seconds> Time to sleep between labeling (min: 10s, max: 3600s) [default: 60]
-o, --output-file <file> Path to output file [default: /etc/kubernetes/node-feature-discovery/features.d/rbln-features]
-h, --help Print help
-V, --version Print versionBelow is the list of the labels generated by RBLN NPU Feature Discovery and their meaning.
Note
Label values in Kubernetes are always of type string. The table's value type describes the type within string formatting.
| Label Name | Value Type | Meaning | Examples |
|---|---|---|---|
| rebellions.ai/driver-version.full | String | Full semantic version of RBLN driver | 1.2.79 |
| rebellions.ai/driver-version.major | Integer | Major of the semantic version of RBLN driver | 1 |
| rebellions.ai/driver-version.minor | Integer | Minor of the semantic verison of RBLN driver | 2 |
| rebellions.ai/driver-version.patch | Integer | Patch of the semantic version of RBLN driver | 79 |
| rebellions.ai/driver-version.revision | String | Revision of the RBLN driver | 6d00b56 |
| rebellions.ai/npu.count | Integer | Number of NPUs | 2 |
| rebellions.ai/npu.family | String | Architecture family of the NPU | ATOM, REBEL |
| rebellions.ai/npu.present | Boolean | Specifies if RBLN NPUs exist on the node | true, false |
| rebellions.ai/npu.product | String | Product name of the NPU | RBLN-CA22, RBLN-CR22 |
RBLN NPU Feature Discovery is written in Rust. Please install Rust in your development environments.
Clone the source code:
git clone https://github.com/rebellions-sw/rbln-npu-feature-discoveryBuild it:
cargo build --releaseRun it:
./target/release/rbln-npu-feature-discovery