Skip to content

Commit c4871d1

Browse files
committed
support biren
Signed-off-by: james <open4pd@4paradigm.com>
1 parent 9e381ab commit c4871d1

6 files changed

Lines changed: 396 additions & 0 deletions

File tree

Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
---
2+
title: Enable Biren Sharing
3+
---
4+
5+
## Introduction
6+
7+
HAMi now supports sharing `birentech.com/gpu` (Birentech) devices and provides the following capabilities:
8+
9+
**Supports both full-card and SVI partitioning**: You can use either the full-card device or the SVI-based partitioning device.
10+
11+
**Device UUID selection**: You can specify or exclude particular devices through annotations.
12+
13+
## Using Biren Devices
14+
15+
### Enabling Biren Device Sharing
16+
17+
#### Label the Node
18+
19+
```bash
20+
kubectl label node {biren-node} biren=on
21+
```
22+
23+
#### Deploy the `biren-device-plugin`
24+
25+
```yaml
26+
apiVersion: v1
27+
kind: Namespace
28+
metadata:
29+
name: biren-gpu
30+
---
31+
32+
apiVersion: v1
33+
kind: ServiceAccount
34+
metadata:
35+
name: device-plugin-sa
36+
namespace: biren-gpu
37+
---
38+
39+
apiVersion: rbac.authorization.k8s.io/v1
40+
kind: ClusterRole
41+
metadata:
42+
name: birentech-device-plugin
43+
rules:
44+
- apiGroups: [""]
45+
resources:
46+
- nodes
47+
- pods
48+
verbs: ["get", "list", "watch", "update", "patch"]
49+
50+
---
51+
apiVersion: rbac.authorization.k8s.io/v1
52+
kind: ClusterRoleBinding
53+
metadata:
54+
name: birentech-device-plugin
55+
roleRef:
56+
apiGroup: rbac.authorization.k8s.io
57+
kind: ClusterRole
58+
name: birentech-device-plugin
59+
subjects:
60+
- kind: ServiceAccount
61+
name: device-plugin-sa
62+
namespace: biren-gpu
63+
64+
---
65+
apiVersion: apps/v1
66+
kind: DaemonSet
67+
metadata:
68+
name: biren-device-plugin-daemonset
69+
namespace: biren-gpu
70+
spec:
71+
selector:
72+
matchLabels:
73+
name: biren-device-plugin
74+
template:
75+
metadata:
76+
annotations:
77+
scheduler.alpha.kubernetes.io/critical-pod: ""
78+
labels:
79+
name: biren-device-plugin
80+
app.kubernetes.io/component: exporter
81+
app.kubernetes.io/name: gpu-exporter
82+
spec:
83+
nodeSelector:
84+
birentech.com: gpu
85+
tolerations:
86+
- key: CriticalAddonsOnly
87+
operator: Exists
88+
- key: birentech.com/gpu
89+
operator: Exists
90+
effect: NoSchedule
91+
priorityClassName: "system-node-critical"
92+
containers:
93+
- name: k8s-device-plugin
94+
image: projecthami/biren-device-plugin:latest
95+
imagePullPolicy: Always
96+
env:
97+
- name: LD_LIBRARY_PATH
98+
value: /usr/lib
99+
- name: NODE_NAME
100+
valueFrom:
101+
fieldRef:
102+
fieldPath: spec.nodeName
103+
command: ["/root/k8s-device-plugin"]
104+
args: ["--pulse", "300", "--container-runtime", "runc"]
105+
securityContext:
106+
privileged: true
107+
volumeMounts:
108+
- name: dp
109+
mountPath: /var/lib/kubelet/device-plugins
110+
- name: sys
111+
mountPath: /sys
112+
- name: brml
113+
mountPath: /usr/lib
114+
- name: brml-lib
115+
mountPath: /usr/local/birensupa/driver/biren-smi/lib
116+
readOnly: true
117+
- name: brsmi
118+
mountPath: /opt/birentech/bin
119+
- mountPath: /dev
120+
name: device
121+
- name: cdi-config
122+
mountPath: /etc/cdi
123+
serviceAccountName: device-plugin-sa
124+
volumes:
125+
- name: dp
126+
hostPath:
127+
path: /var/lib/kubelet/device-plugins
128+
- name: sys
129+
hostPath:
130+
path: /sys
131+
- name: brml
132+
hostPath:
133+
path: /usr/lib
134+
- name: brsmi
135+
hostPath:
136+
path: /usr/bin
137+
- name: device
138+
hostPath:
139+
path: /dev
140+
- name: cdi-config
141+
hostPath:
142+
path: /etc/cdi
143+
- name: brml-lib
144+
hostPath:
145+
path: /usr/local/birensupa/driver/biren-smi/lib
146+
```
147+
148+
### Run Biren jobs
149+
150+
```yaml
151+
kind: Pod
152+
metadata:
153+
name: pod1
154+
spec:
155+
containers:
156+
- image: ubuntu
157+
name: pod1-ctr
158+
command: ["sleep"]
159+
args: ["infinity"]
160+
resources:
161+
limits:
162+
birentech.com/gpu: 1
163+
```
164+
165+
## Notes
166+
1. When requesting Biren resources, you cannot specify the memory size.
167+
2. SVI partitioning can only split a single card into either two or four partitions.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
title: Allocate Biren Device
3+
---
4+
5+
This example shows how to request a single Biren device in a plain Kubernetes Pod.
6+
The Pod runs a long-running container image provided by Birentech and requests one `birentech.com/gpu` device through the `resources.limits` section.
7+
You can use this as a starting point and adjust the image and resource limits to fit your own workloads.
8+
9+
```yaml
10+
apiVersion: v1
11+
kind: Pod
12+
metadata:
13+
name: pod1
14+
spec:
15+
containers:
16+
- image: ubuntu
17+
name: pod1-ctr
18+
command: ["sleep"]
19+
args: ["infinity"]
20+
resources:
21+
limits:
22+
birentech.com/gpu: 1
23+
```

docs/userguide/device-supported.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,5 @@ The table below lists the devices supported by HAMi:
1616
| GCU | Enflame | S60 | Yes | Yes | No |
1717
| XPU | Kunlunxin | P800 | Yes | Yes | No |
1818
| GPU | Vastai | VA16 | Yes | Yes | No |
19+
| GPU | Biren | Biren166M | Yes | Yes | No |
1920
| DPU | Teco | Checking | In progress | In progress | No |
Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
---
2+
title: 启用壁仞设备共享
3+
---
4+
5+
## Introduction
6+
7+
HAMi 现在支持共享 `birentech.com/gpu` (壁仞科技) 设备,并提供以下能力:
8+
9+
**支持整卡和 SVI 切分 SVI**: 可以在 HAMI 中使用整卡和SVI切分出来的卡。
10+
11+
**设备 UUID 选择n**: 可以通过注解指定或排除某些特定设备。
12+
13+
## 使用壁仞设备
14+
15+
### 启用壁仞设备共享
16+
17+
#### 给节点打标签
18+
19+
```bash
20+
kubectl label node {biren-node} biren=on
21+
```
22+
23+
#### 部署 `biren-device-plugin`
24+
25+
```yaml
26+
apiVersion: v1
27+
kind: Namespace
28+
metadata:
29+
name: biren-gpu
30+
---
31+
32+
apiVersion: v1
33+
kind: ServiceAccount
34+
metadata:
35+
name: device-plugin-sa
36+
namespace: biren-gpu
37+
---
38+
39+
apiVersion: rbac.authorization.k8s.io/v1
40+
kind: ClusterRole
41+
metadata:
42+
name: birentech-device-plugin
43+
rules:
44+
- apiGroups: [""]
45+
resources:
46+
- nodes
47+
- pods
48+
verbs: ["get", "list", "watch", "update", "patch"]
49+
50+
---
51+
apiVersion: rbac.authorization.k8s.io/v1
52+
kind: ClusterRoleBinding
53+
metadata:
54+
name: birentech-device-plugin
55+
roleRef:
56+
apiGroup: rbac.authorization.k8s.io
57+
kind: ClusterRole
58+
name: birentech-device-plugin
59+
subjects:
60+
- kind: ServiceAccount
61+
name: device-plugin-sa
62+
namespace: biren-gpu
63+
64+
---
65+
apiVersion: apps/v1
66+
kind: DaemonSet
67+
metadata:
68+
name: biren-device-plugin-daemonset
69+
namespace: biren-gpu
70+
spec:
71+
selector:
72+
matchLabels:
73+
name: biren-device-plugin
74+
template:
75+
metadata:
76+
annotations:
77+
scheduler.alpha.kubernetes.io/critical-pod: ""
78+
labels:
79+
name: biren-device-plugin
80+
app.kubernetes.io/component: exporter
81+
app.kubernetes.io/name: gpu-exporter
82+
spec:
83+
nodeSelector:
84+
birentech.com: gpu
85+
tolerations:
86+
- key: CriticalAddonsOnly
87+
operator: Exists
88+
- key: birentech.com/gpu
89+
operator: Exists
90+
effect: NoSchedule
91+
priorityClassName: "system-node-critical"
92+
containers:
93+
- name: k8s-device-plugin
94+
image: projecthami/biren-device-plugin:latest
95+
imagePullPolicy: Always
96+
env:
97+
- name: LD_LIBRARY_PATH
98+
value: /usr/lib
99+
- name: NODE_NAME
100+
valueFrom:
101+
fieldRef:
102+
fieldPath: spec.nodeName
103+
command: ["/root/k8s-device-plugin"]
104+
args: ["--pulse", "300", "--container-runtime", "runc"]
105+
securityContext:
106+
privileged: true
107+
volumeMounts:
108+
- name: dp
109+
mountPath: /var/lib/kubelet/device-plugins
110+
- name: sys
111+
mountPath: /sys
112+
- name: brml
113+
mountPath: /usr/lib
114+
- name: brml-lib
115+
mountPath: /usr/local/birensupa/driver/biren-smi/lib
116+
readOnly: true
117+
- name: brsmi
118+
mountPath: /opt/birentech/bin
119+
- mountPath: /dev
120+
name: device
121+
- name: cdi-config
122+
mountPath: /etc/cdi
123+
serviceAccountName: device-plugin-sa
124+
volumes:
125+
- name: dp
126+
hostPath:
127+
path: /var/lib/kubelet/device-plugins
128+
- name: sys
129+
hostPath:
130+
path: /sys
131+
- name: brml
132+
hostPath:
133+
path: /usr/lib
134+
- name: brsmi
135+
hostPath:
136+
path: /usr/bin
137+
- name: device
138+
hostPath:
139+
path: /dev
140+
- name: cdi-config
141+
hostPath:
142+
path: /etc/cdi
143+
- name: brml-lib
144+
hostPath:
145+
path: /usr/local/birensupa/driver/biren-smi/lib
146+
```
147+
148+
### 运行壁仞任务
149+
150+
```yaml
151+
kind: Pod
152+
metadata:
153+
name: pod1
154+
spec:
155+
containers:
156+
- image: ubuntu
157+
name: pod1-ctr
158+
command: ["sleep"]
159+
args: ["infinity"]
160+
resources:
161+
limits:
162+
birentech.com/gpu: 1
163+
```
164+
165+
## 注意事项
166+
1. 在申请壁仞资源时,**不能**指定显存大小。
167+
2. 使用 SVI 切分时,一张卡只能切成两份或者四份。
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
title: 申请壁仞设备
3+
---
4+
5+
下面的示例展示了如何在一个普通的 Kubernetes Pod 中申请一个翰博半导体的设备。
6+
该 Pod 以长时间运行的方式启动容器,并通过 `resources.limits` 中声明一个 `birentech.com/gpu` 设备。
7+
你可以在此基础上替换镜像、命令或资源配额,以适配自己的业务场景。
8+
9+
```yaml
10+
apiVersion: v1
11+
kind: Pod
12+
metadata:
13+
name: pod1
14+
spec:
15+
containers:
16+
- image: ubuntu
17+
name: pod1-ctr
18+
command: ["sleep"]
19+
args: ["infinity"]
20+
resources:
21+
limits:
22+
birentech.com/gpu: 1
23+
```

0 commit comments

Comments
 (0)