Skip to content

rebellions-sw/rbln-npu-feature-discovery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RBLN NPU Feature Discovery

This application allows you to automatically generate Kubernetes node labels for the set of Rebellions' NPUs available on a node. It leverages the Node Feature Discovery to perform this labeling.

Prerequisites

  • Kubernetes >= 1.10
  • Nodes with RBLN devices (e.g. ATOM, REBEL) and RBLN driver >= 1.2.79
  • NFD deployed on each node you want to label with the local source configured

Quickstart

The following assumes you have at least one node in your cluster with Rebellions NPUs and the proper RBLN driver has already been installed on it.

Node Feature Discovery

The first step is to make sure that Node Feature Discovery is running on every node you want to label. RBLN NPU Feature Discovery use the local source so be sure to mount volumes. See https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/customization-guide.html#local-feature-source for more details.

You can install NFD using the following command which can be found on https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/get-started/quick-start.html#quick-start

kubectl apply -k "https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.17.1"

Note:* This is a simple static daemonset meant to demonstrate the basic features required of node-feature-discovery in order to successfully run rbln-npu-feature-discovery. Please see the instruction below for Deployment via helm when deploying in a production setting.

Preparing your NPU Nodes

Please install RBLN driver on all your NPU nodes in order to let rbln-npu-feature-discovery collect driver-releated features correctly.

Deploy RBLN NPU Feature Discovery

The next step is to run RBLN NPU Feature Discovery on each node as a Daemonset or as a Job.

Daemonset

kubectl apply -f https://raw.githubusercontent.com/rebellions-sw/rbln-npu-feature-discovery/v0.1.0/deployments/static/npu-feature-discovery-daemonset.yaml

Note: This is a simple static daemonset meant to demonstrate the basic features required of rbln-npu-feature-discovery. Please see the instructions below for Deployment via helm when deploying in a production setting.

Job

You must change the NODE_NAME value in the template to match the name of the node you want to label:

curl https://raw.githubusercontent.com/rebellions-sw/rbln-npu-feature-discovery/v0.1.0/deployments/static/npu-feature-discovery-job.yaml.template \
    | sed "s/NODE_NAME/<your-node-name>/" > npu-feature-discovery-job.yaml
kubectl apply -f npu-feature-discovery-job.yaml

Note: This method should only be used for testing and not deployed in a production setting.

Verify Everything Works

With both NFD and rbln-npu-feature-discovery deployed and running, you should now be able to see NPU related labels appearing on any nodes that have NPUs installed on them.

$ kubectl get node <npu-node-name> -o yaml
apiVersion: v1
items:
- apiVersion: v1
  kind: Node
  metadata:
    ...
    labels:
      rebellions.ai/driver-version.full: 1.2.79-6d00b56-release
      rebellions.ai/driver-version.major: "1"
      rebellions.ai/driver-version.minor: "2"
      rebellions.ai/driver-version.patch: "79"
      rebellions.ai/driver-version.revision: 6d00b56
      rebellions.ai/npu.count: "1"
      rebellions.ai/npu.family: ATOM
      rebellions.ai/npu.product: RBLN-CA02

Command line interface

Available options:

$ rbln-npu-feature-discovery --help
RBLN NPU Feature Discovery

Usage: rbln-npu-feature-discovery [OPTIONS]

Options:
      --oneshot                   Label once and exit
      --no-timestamp              Do not add timestamp to the labels
      --sleep-interval <seconds>  Time to sleep between labeling (min: 10s, max: 3600s) [default: 60]
  -o, --output-file <file>        Path to output file [default: /etc/kubernetes/node-feature-discovery/features.d/rbln-features]
  -h, --help                      Print help
  -V, --version                   Print version

Generated Labels

Below is the list of the labels generated by RBLN NPU Feature Discovery and their meaning.

Note

Label values in Kubernetes are always of type string. The table's value type describes the type within string formatting.

Label Name Value Type Meaning Examples
rebellions.ai/driver-version.full String Full semantic version of RBLN driver 1.2.79
rebellions.ai/driver-version.major Integer Major of the semantic version of RBLN driver 1
rebellions.ai/driver-version.minor Integer Minor of the semantic verison of RBLN driver 2
rebellions.ai/driver-version.patch Integer Patch of the semantic version of RBLN driver 79
rebellions.ai/driver-version.revision String Revision of the RBLN driver 6d00b56
rebellions.ai/npu.count Integer Number of NPUs 2
rebellions.ai/npu.family String Architecture family of the NPU ATOM, REBEL
rebellions.ai/npu.present Boolean Specifies if RBLN NPUs exist on the node true, false
rebellions.ai/npu.product String Product name of the NPU RBLN-CA22, RBLN-CR22

Development

RBLN NPU Feature Discovery is written in Rust. Please install Rust in your development environments.

Clone the source code:

git clone https://github.com/rebellions-sw/rbln-npu-feature-discovery

Build it:

cargo build --release

Run it:

./target/release/rbln-npu-feature-discovery

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published