Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 21 additions & 108 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,128 +1,41 @@
# Biren GPU Device plugin

## About

The Biren GPU device plugin is as Daemonset that allows you to automatically:

1. Expose the number of GPUs on each nodes for you cluster
2. Keep track of the health of your GPUs
3. Run GPU enabled containers in your k8s cluster

This repository contains Biren's official implementation of the [k8s device plugin](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md)
# Biren Device Plugin

## Prerequisites

The list of prerequisites for running the Biren device plugin is described below:

1. Biren GPU Driver >= 1.2.2
2. Kubernetes >=1.13
3. if need mount dri device, need run `modprobe -v vgem` in host which have gpus

## SVI in Device plugin

1. SVI devices will not be created dynamically anywhere within the k8s software stack (GPU must be configured into svi card and split into svi devices priori)
## Deployment

## SR-IOV in device plugin

1. setup SR-IOV vfio driver
2. run device plugin with --container-runtime kata
### Label the Node with `birentech.com=gpu`
```bash
kubectl label node {biren-node} birentech.com=gpu
```

## Quick Start
### Deploy `biren-device-plugin`

### Build Image

```bash
kubectl apply -f deploy/biren-device-plugin.yaml
```
make image-build
```

### Deploy

`kubectl create -f deploy/biren-device-plugin.yaml`

### Running GPU Pods
### Usage

```
$ cat <<EOF | kubectl apply -f
```yaml
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
name: pod1
spec:
restartPolicy: Never
restartPolicy: OnFailure
containers:
- image: ubuntu:20.04
name: pod1-ctr
command: ["sleep"]
args: ["infinity"]
resources:
limits:
birentech.com/gpu: 1
EOF
```

## Command

```
Biren gpu device plugin

Usage:
br-gpu-device-plugin [flags]

Flags:
--cdi-feature enable cdi feature
--container-runtime string the container runtime;runc or kata, default is runc
-h, --help help for br-gpu-device-plugin
--mount-host-path mount lib and bin folder in host to container, default is false
--overwrite-cdi-config overwrite cdi config
--pulse int heart beating every seconds
```

## How to use it

requests
`birentech.com/gpu: num`
`birentech.com/1-4-gpu: num`
`birentech.com/1-2-gpu: num`

## CDI (container device interface) Feature

- https://github.com/cncf-tags/container-device-interface

### Version requirements

- kubelet >= 1.28
- containerd >= 1.7.0

### How to use it

#### kubelet

In kubelet version 1.28, the CDI feature is in alpha state, so it needs to be enabled manually. To do this, add the `--feature-gates=DevicePluginCDIDevices=true` argument to the kubelet startup command.

#### containerd

Modify the containerd configuration file as follows:

```toml
[plugins."io.containerd.grpc.v1.cri"]
cdi_spec_dirs = ["/etc/cdi", "/var/run/cdi"]
enable_cdi = true
```

#### k8s-device-plugin

Add the startup command parameter `--cdi-feature` to enable the CDI feature. If the CDI feature is enabled, this will generate a biren.yaml file in the node's `/etc/cdi` directory, which defines the configuration of CDI. If the startup command parameter includes `--overwrite-cdi-config`, the configuration file will be overwritten each time it starts. Otherwise, if the biren.yaml configuration file already exists, it will not be overwritten.

k8s-device-plugin startup command example:

```yaml
command:
- "/root/k8s-device-plugin"
args:
- "--pulse"
- "300"
- "--container-runtime"
- "runc"
- "--cdi-feature" # enable cdi feature
- "--overwrite-cdi-config" # overwrite cdi config
```
- image: ubuntu
name: pod1-ctr
command: ["sleep"]
args: ["infinity"]
resources:
limits:
birentech.com/gpu: 1
```
21 changes: 16 additions & 5 deletions deploy/biren-device-plugin.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ rules:
resources:
- nodes
- pods
verbs: ["get", "list", "watch", "update"]
verbs: ["get", "list", "watch", "update", "patch"]

---
apiVersion: rbac.authorization.k8s.io/v1
Expand Down Expand Up @@ -65,11 +65,16 @@ spec:
effect: NoSchedule
priorityClassName: "system-node-critical"
containers:
- image: ghcr.io/birentechnology/k8s-device-plugin:v0.7.6
name: k8s-device-plugin
- name: k8s-device-plugin
image: projecthami/biren-device-plugin:latest
imagePullPolicy: Always
env:
- name: LD_LIBRARY_PATH
value: /opt/birentech/lib
value: /usr/lib
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
command: ["/root/k8s-device-plugin"]
args: ["--pulse", "300", "--container-runtime", "runc"]
securityContext:
Expand All @@ -80,7 +85,10 @@ spec:
- name: sys
mountPath: /sys
- name: brml
mountPath: /opt/birentech/lib
mountPath: /usr/lib
- name: brml-lib
mountPath: /usr/local/birensupa/driver/biren-smi/lib
readOnly: true
- name: brsmi
mountPath: /opt/birentech/bin
- mountPath: /dev
Expand All @@ -107,3 +115,6 @@ spec:
- name: cdi-config
hostPath:
path: /etc/cdi
- name: brml-lib
hostPath:
path: /usr/local/birensupa/driver/biren-smi/lib
20 changes: 8 additions & 12 deletions deploy/example-pod.yaml
Original file line number Diff line number Diff line change
@@ -1,18 +1,14 @@
apiVersion: v1
kind: Pod
metadata:
generateName: pod-
name: pod1
spec:
restartPolicy: OnFailure
containers:
- image: ubuntu
name: pod1-ctr
command: ["sleep"]
args: ["infinity"]
resources:
requests:
birentech.com/gpu: 4
limits:
birentech.com/gpu: 4
# birentech.com/1-4-gpu: 1
# birentech.com/1-2-gpu: 1
- image: ubuntu
name: pod1-ctr
command: ["sleep"]
args: ["infinity"]
resources:
limits:
birentech.com/gpu: 1
Loading