|
1 | | -# Biren GPU Device plugin |
2 | | - |
3 | | -## About |
4 | | - |
5 | | -The Biren GPU device plugin is as Daemonset that allows you to automatically: |
6 | | - |
7 | | -1. Expose the number of GPUs on each nodes for you cluster |
8 | | -2. Keep track of the health of your GPUs |
9 | | -3. Run GPU enabled containers in your k8s cluster |
10 | | - |
11 | | -This repository contains Biren's official implementation of the [k8s device plugin](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md) |
| 1 | +# Biren Device Plugin |
12 | 2 |
|
13 | 3 | ## Prerequisites |
14 | 4 |
|
15 | 5 | The list of prerequisites for running the Biren device plugin is described below: |
16 | 6 |
|
17 | 7 | 1. Biren GPU Driver >= 1.2.2 |
18 | 8 | 2. Kubernetes >=1.13 |
19 | | -3. if need mount dri device, need run `modprobe -v vgem` in host which have gpus |
20 | | - |
21 | | -## SVI in Device plugin |
22 | 9 |
|
23 | | -1. SVI devices will not be created dynamically anywhere within the k8s software stack (GPU must be configured into svi card and split into svi devices priori) |
| 10 | +## Deployment |
24 | 11 |
|
25 | | -## SR-IOV in device plugin |
26 | | - |
27 | | -1. setup SR-IOV vfio driver |
28 | | -2. run device plugin with --container-runtime kata |
| 12 | +### Label the Node with `birentech.com=gpu` |
| 13 | +```bash |
| 14 | +kubectl label node {biren-node} birentech.com=gpu |
| 15 | +``` |
29 | 16 |
|
30 | | -## Quick Start |
| 17 | +### Deploy `biren-device-plugin` |
31 | 18 |
|
32 | | -### Build Image |
33 | 19 |
|
| 20 | +```bash |
| 21 | +kubectl apply -f deploy/biren-device-plugin.yaml |
34 | 22 | ``` |
35 | | -make image-build |
36 | | -``` |
37 | | - |
38 | | -### Deploy |
39 | | - |
40 | | -`kubectl create -f deploy/biren-device-plugin.yaml` |
41 | 23 |
|
42 | | -### Running GPU Pods |
| 24 | +### Usage |
43 | 25 |
|
44 | | -``` |
45 | | -$ cat <<EOF | kubectl apply -f |
| 26 | +```yaml |
46 | 27 | apiVersion: v1 |
47 | 28 | kind: Pod |
48 | 29 | metadata: |
49 | | - name: gpu-pod |
| 30 | + name: pod1 |
50 | 31 | spec: |
51 | | - restartPolicy: Never |
| 32 | + restartPolicy: OnFailure |
52 | 33 | containers: |
53 | | - - image: ubuntu:20.04 |
54 | | - name: pod1-ctr |
55 | | - command: ["sleep"] |
56 | | - args: ["infinity"] |
57 | | - resources: |
58 | | - limits: |
59 | | - birentech.com/gpu: 1 |
60 | | -EOF |
61 | | -``` |
62 | | - |
63 | | -## Command |
64 | | - |
65 | | -``` |
66 | | -Biren gpu device plugin |
67 | | -
|
68 | | -Usage: |
69 | | - br-gpu-device-plugin [flags] |
70 | | -
|
71 | | -Flags: |
72 | | - --cdi-feature enable cdi feature |
73 | | - --container-runtime string the container runtime;runc or kata, default is runc |
74 | | - -h, --help help for br-gpu-device-plugin |
75 | | - --mount-host-path mount lib and bin folder in host to container, default is false |
76 | | - --overwrite-cdi-config overwrite cdi config |
77 | | - --pulse int heart beating every seconds |
78 | | -``` |
79 | | - |
80 | | -## How to use it |
81 | | - |
82 | | -requests |
83 | | -`birentech.com/gpu: num` |
84 | | -`birentech.com/1-4-gpu: num` |
85 | | -`birentech.com/1-2-gpu: num` |
86 | | - |
87 | | -## CDI (container device interface) Feature |
88 | | - |
89 | | -- https://github.com/cncf-tags/container-device-interface |
90 | | - |
91 | | -### Version requirements |
92 | | - |
93 | | -- kubelet >= 1.28 |
94 | | -- containerd >= 1.7.0 |
95 | | - |
96 | | -### How to use it |
97 | | - |
98 | | -#### kubelet |
99 | | - |
100 | | -In kubelet version 1.28, the CDI feature is in alpha state, so it needs to be enabled manually. To do this, add the `--feature-gates=DevicePluginCDIDevices=true` argument to the kubelet startup command. |
101 | | - |
102 | | -#### containerd |
103 | | - |
104 | | -Modify the containerd configuration file as follows: |
105 | | - |
106 | | -```toml |
107 | | -[plugins."io.containerd.grpc.v1.cri"] |
108 | | - cdi_spec_dirs = ["/etc/cdi", "/var/run/cdi"] |
109 | | - enable_cdi = true |
110 | | -``` |
111 | | - |
112 | | -#### k8s-device-plugin |
113 | | - |
114 | | -Add the startup command parameter `--cdi-feature` to enable the CDI feature. If the CDI feature is enabled, this will generate a biren.yaml file in the node's `/etc/cdi` directory, which defines the configuration of CDI. If the startup command parameter includes `--overwrite-cdi-config`, the configuration file will be overwritten each time it starts. Otherwise, if the biren.yaml configuration file already exists, it will not be overwritten. |
115 | | - |
116 | | -k8s-device-plugin startup command example: |
117 | | - |
118 | | -```yaml |
119 | | -command: |
120 | | - - "/root/k8s-device-plugin" |
121 | | -args: |
122 | | - - "--pulse" |
123 | | - - "300" |
124 | | - - "--container-runtime" |
125 | | - - "runc" |
126 | | - - "--cdi-feature" # enable cdi feature |
127 | | - - "--overwrite-cdi-config" # overwrite cdi config |
128 | | -``` |
| 34 | + - image: ubuntu |
| 35 | + name: pod1-ctr |
| 36 | + command: ["sleep"] |
| 37 | + args: ["infinity"] |
| 38 | + resources: |
| 39 | + limits: |
| 40 | + birentech.com/gpu: 1 |
| 41 | +``` |
0 commit comments