Skip to content

Commit 10efba4

Browse files
authored
Merge pull request #6 from DSFans2014/docs/update_yaml
2 parents 4623324 + 1061dfc commit 10efba4

3 files changed

Lines changed: 45 additions & 125 deletions

File tree

README.md

Lines changed: 21 additions & 108 deletions
Original file line numberDiff line numberDiff line change
@@ -1,128 +1,41 @@
1-
# Biren GPU Device plugin
2-
3-
## About
4-
5-
The Biren GPU device plugin is as Daemonset that allows you to automatically:
6-
7-
1. Expose the number of GPUs on each nodes for you cluster
8-
2. Keep track of the health of your GPUs
9-
3. Run GPU enabled containers in your k8s cluster
10-
11-
This repository contains Biren's official implementation of the [k8s device plugin](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md)
1+
# Biren Device Plugin
122

133
## Prerequisites
144

155
The list of prerequisites for running the Biren device plugin is described below:
166

177
1. Biren GPU Driver >= 1.2.2
188
2. Kubernetes >=1.13
19-
3. if need mount dri device, need run `modprobe -v vgem` in host which have gpus
20-
21-
## SVI in Device plugin
229

23-
1. SVI devices will not be created dynamically anywhere within the k8s software stack (GPU must be configured into svi card and split into svi devices priori)
10+
## Deployment
2411

25-
## SR-IOV in device plugin
26-
27-
1. setup SR-IOV vfio driver
28-
2. run device plugin with --container-runtime kata
12+
### Label the Node with `birentech.com=gpu`
13+
```bash
14+
kubectl label node {biren-node} birentech.com=gpu
15+
```
2916

30-
## Quick Start
17+
### Deploy `biren-device-plugin`
3118

32-
### Build Image
3319

20+
```bash
21+
kubectl apply -f deploy/biren-device-plugin.yaml
3422
```
35-
make image-build
36-
```
37-
38-
### Deploy
39-
40-
`kubectl create -f deploy/biren-device-plugin.yaml`
4123

42-
### Running GPU Pods
24+
### Usage
4325

44-
```
45-
$ cat <<EOF | kubectl apply -f
26+
```yaml
4627
apiVersion: v1
4728
kind: Pod
4829
metadata:
49-
name: gpu-pod
30+
name: pod1
5031
spec:
51-
restartPolicy: Never
32+
restartPolicy: OnFailure
5233
containers:
53-
- image: ubuntu:20.04
54-
name: pod1-ctr
55-
command: ["sleep"]
56-
args: ["infinity"]
57-
resources:
58-
limits:
59-
birentech.com/gpu: 1
60-
EOF
61-
```
62-
63-
## Command
64-
65-
```
66-
Biren gpu device plugin
67-
68-
Usage:
69-
br-gpu-device-plugin [flags]
70-
71-
Flags:
72-
--cdi-feature enable cdi feature
73-
--container-runtime string the container runtime;runc or kata, default is runc
74-
-h, --help help for br-gpu-device-plugin
75-
--mount-host-path mount lib and bin folder in host to container, default is false
76-
--overwrite-cdi-config overwrite cdi config
77-
--pulse int heart beating every seconds
78-
```
79-
80-
## How to use it
81-
82-
requests
83-
`birentech.com/gpu: num`
84-
`birentech.com/1-4-gpu: num`
85-
`birentech.com/1-2-gpu: num`
86-
87-
## CDI (container device interface) Feature
88-
89-
- https://github.com/cncf-tags/container-device-interface
90-
91-
### Version requirements
92-
93-
- kubelet >= 1.28
94-
- containerd >= 1.7.0
95-
96-
### How to use it
97-
98-
#### kubelet
99-
100-
In kubelet version 1.28, the CDI feature is in alpha state, so it needs to be enabled manually. To do this, add the `--feature-gates=DevicePluginCDIDevices=true` argument to the kubelet startup command.
101-
102-
#### containerd
103-
104-
Modify the containerd configuration file as follows:
105-
106-
```toml
107-
[plugins."io.containerd.grpc.v1.cri"]
108-
cdi_spec_dirs = ["/etc/cdi", "/var/run/cdi"]
109-
enable_cdi = true
110-
```
111-
112-
#### k8s-device-plugin
113-
114-
Add the startup command parameter `--cdi-feature` to enable the CDI feature. If the CDI feature is enabled, this will generate a biren.yaml file in the node's `/etc/cdi` directory, which defines the configuration of CDI. If the startup command parameter includes `--overwrite-cdi-config`, the configuration file will be overwritten each time it starts. Otherwise, if the biren.yaml configuration file already exists, it will not be overwritten.
115-
116-
k8s-device-plugin startup command example:
117-
118-
```yaml
119-
command:
120-
- "/root/k8s-device-plugin"
121-
args:
122-
- "--pulse"
123-
- "300"
124-
- "--container-runtime"
125-
- "runc"
126-
- "--cdi-feature" # enable cdi feature
127-
- "--overwrite-cdi-config" # overwrite cdi config
128-
```
34+
- image: ubuntu
35+
name: pod1-ctr
36+
command: ["sleep"]
37+
args: ["infinity"]
38+
resources:
39+
limits:
40+
birentech.com/gpu: 1
41+
```

deploy/biren-device-plugin.yaml

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ rules:
2020
resources:
2121
- nodes
2222
- pods
23-
verbs: ["get", "list", "watch", "update"]
23+
verbs: ["get", "list", "watch", "update", "patch"]
2424

2525
---
2626
apiVersion: rbac.authorization.k8s.io/v1
@@ -65,11 +65,16 @@ spec:
6565
effect: NoSchedule
6666
priorityClassName: "system-node-critical"
6767
containers:
68-
- image: ghcr.io/birentechnology/k8s-device-plugin:v0.7.6
69-
name: k8s-device-plugin
68+
- name: k8s-device-plugin
69+
image: projecthami/biren-device-plugin:latest
70+
imagePullPolicy: Always
7071
env:
7172
- name: LD_LIBRARY_PATH
72-
value: /opt/birentech/lib
73+
value: /usr/lib
74+
- name: NODE_NAME
75+
valueFrom:
76+
fieldRef:
77+
fieldPath: spec.nodeName
7378
command: ["/root/k8s-device-plugin"]
7479
args: ["--pulse", "300", "--container-runtime", "runc"]
7580
securityContext:
@@ -80,7 +85,10 @@ spec:
8085
- name: sys
8186
mountPath: /sys
8287
- name: brml
83-
mountPath: /opt/birentech/lib
88+
mountPath: /usr/lib
89+
- name: brml-lib
90+
mountPath: /usr/local/birensupa/driver/biren-smi/lib
91+
readOnly: true
8492
- name: brsmi
8593
mountPath: /opt/birentech/bin
8694
- mountPath: /dev
@@ -107,3 +115,6 @@ spec:
107115
- name: cdi-config
108116
hostPath:
109117
path: /etc/cdi
118+
- name: brml-lib
119+
hostPath:
120+
path: /usr/local/birensupa/driver/biren-smi/lib

deploy/example-pod.yaml

Lines changed: 8 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,14 @@
11
apiVersion: v1
22
kind: Pod
33
metadata:
4-
generateName: pod-
4+
name: pod1
55
spec:
66
restartPolicy: OnFailure
77
containers:
8-
- image: ubuntu
9-
name: pod1-ctr
10-
command: ["sleep"]
11-
args: ["infinity"]
12-
resources:
13-
requests:
14-
birentech.com/gpu: 4
15-
limits:
16-
birentech.com/gpu: 4
17-
# birentech.com/1-4-gpu: 1
18-
# birentech.com/1-2-gpu: 1
8+
- image: ubuntu
9+
name: pod1-ctr
10+
command: ["sleep"]
11+
args: ["infinity"]
12+
resources:
13+
limits:
14+
birentech.com/gpu: 1

0 commit comments

Comments
 (0)